Sunday, February 22, 2015

IPv6 and IPv4 name resolution with Go

As part of a work-related project, I'm writing code that needs to resolve DNS names using Go, on illumos.

While doing this work, I noticed a very surprising thing.  When a host has both IPv6 and IPv4 addresses associated with a name (such as localhost), Go prefers to resolve to the IPv4 version of the name, unless one has asked specifically for v6 names.

This flies in the fact of existing practice on illumos & Solaris systems, where resolving a name tends to give an IPv6 result, assuming that any IPv6 address is plumbed on the system.  (And on modern installations, that is the default -- at least the loopback interface of ::1 is always plumbed by default.  And not only that, but services listening on that address will automatically serve up both v6 and v4 clients that connect on either ::1 or 127.0.0.1.)

The rationale for this logic is buried in the Go net/ipsock.go file, in comments for the function firstFavoriteAddr ():
    76			// We'll take any IP address, but since the dialing
    77			// code does not yet try multiple addresses
    78			// effectively, prefer to use an IPv4 address if
    79			// possible. This is especially relevant if localhost
    80			// resolves to [ipv6-localhost, ipv4-localhost]. Too
    81			// much code assumes localhost == ipv4-localhost.
This is a really surprising result.  If you want to get IPv6 names by default, with Go, you could use the net.ResolveIPAddr() (or ResolveTCPAddr() or ResolveUDPAddr()) functions with the network type of "ip6", "tcp6", or "udp6" first.  Then if that resolution fails, you can try the standard versions, or the v4 specific versions (doing the latter is probably slightly more efficient.)  Here's what that code looks like:
        name := "localhost"

        // First we try IPv6.  Note that we *hope* this fails if the host
        // stack does not actually support IPv6.
        err, ip := net.ResolveIP("ip6", name)
        if err != nil {
                // IPv6 not found, maybe IPv4?
                err, ip = net.ResolveIP("ip4", name)
        }
However, this choice turns out to also be evil, because while ::1 often works locally as an IPv6 address and is functional, other addresses, for example www.google.com, will resolve to IPv6 addresses which will not work unless you have a clean IPv6 path all the way through.  For example, the above gives me this for www.google.com: 2607:f8b0:4007:804::1010, but if I try to telnet to it, it won't work -- no route to host (of course, because I don't have an IPv6 path to the Internet, both my home gateway and my ISP are IPv4 only.)


Its kind of a sad that the Go people felt that they had to make this choice -- at some level it robs the choice from the administrator, and encourages the existing broken code to remain so.  I'm not sure what the other systems use, but at least on illumos, we have a stack that understands the situation, and resolves optimally for the given the situation of the user.  Sadly, Go shoves that intelligence aside, and uses its own overrides.

One moral of the story here is -- always use either explicit v4 or v6 addresses if you care, or ensure that your names resolve properly.

Wednesday, February 11, 2015

Rise of mangos

What is mangos?


Those of you who follow me may have heard about this project I've created called mangos.

Mangos is a lightweight messaging system designed to be wire compatible with nanomsg, but is implemented entirely in Go.

There is a very nice write up of mangos by Tyler Treat, which might help explain some things.

Recent Activity


As a consequence of a few things, the last two weeks has seen a substantial rise of use of mangos.

First off, there was Tyler's excellent article.  (By the way, he's done a series comparing and contrasting other messaging systems -- highly recommended reading.)

Second, mangos got mentioned on Hacker News.  That drew a large number visitors to my github repo.

Then another open source project, Goq, switched from using libnanomsg to mangos, using the compatibility shim I provided for such use.  As a consequence of that work, several bugs were identified, and subsequently squashed.

The upshot of all that is that I saw the number unique visitors sky rocket.  On Saturday Feb 7, there were over 2500 unique visitors to the github page, and 29 unique people took clones.  Sunday it tapered sharply to just over 1k visitors, and today there were only 7.  Peaks rarely get sharper than that.

Improvements


Over the past week or so I've made a large number of changes and improvements.  Recently, mangos has grown support for RFC 6455 (websocket), including websocket over TLS, and has had numerous internal improvements.

Some of these changes have broken API.  If you use mangos, I'm sorry about the breakage -- please let me know if you're hurt by this.  (I have created tagged releases for v1.0.0 and v1.1.0 in an attempt to mitigate the risk, but tip still has some interesting changes in it.)

Unlike libnanomsg, mangos (tip only) can notify you when a connection is added or removed, and you can access interesting information about the connection.  This is in the Port API.

Futures


We are using mangos internally at Lucera, and I know now of several cases of production use.  This is kind of scary at one level, since I wrote this originally as a hobby project about a year ago (to learn Go.)  But it has become useful -- frankly extending mangos is far far more pleasurable than working in the C libnanomsg implementation -- a lot of this is thanks to Go which is utterly pleasurable to work in (no matter how bad the guts may be reputed to be).  Being able to write a new TLS transport, or even websocket, in the course of an afternoon or two (actually for TLS it was more like an hour), is really nice.

I'm hoping that more people will find it useful, and that folks who want to experiment with the underlying messaging patterns may find it easier to work with than the C code.  Ideally, there will be more collaborators here, as we start exploring new directions for this stuff.

In the meantime, I'm going to continue to work to improve and extend mangos, because its become one of the tools at my day job.  Its nice when work and pleasure come together!

Thursday, November 13, 2014

A better illumos...

If you follow illumos very closely, you may already know some of this.

A New Fork


Several months ago, I forked illumos-gate (the primary source code repository for the kernel and system components of illumos) into illumos-core.

I had started upstreaming my work from illumos-core into illumos-gate.  I've since ceased that effort, largely because I simply have no time for the various arguments that my work often generates.  I think this is largely because my vision for illumos is somewhat different from that of other folks, and sadly illumos proper lacks anything resembling a guiding vision now, which means that only entirely non-contentious changes can get integrated into illumos.

However, I still want to proceed apace with illumos-core, because I believe that work has real value, and I firmly believe that my vision for illumos is the one that will lead to greater adoption by users, and by distributors as well, since much of what I'm trying to achieve in illumos-gate is aimed at reducing barriers to adoption and to developers both of illumos itself and of systems that want to build on top of or integrate illumos.  (An example of reducing barriers to adoption -- I recently implemented a BSD compatible flock() within libc, which is sometimes used by applications developed for BSD or Linux.)

Relationship to Upstream


I do also invite other parties to cherry-pick from illumos-core into illumos-gate.  I suspect that a large number of the enhancements I've made, such as the support for the fexecve() function specified by POSIX 2008, are likely to be more widely useful.  Within illumos-core, I want to retain a high standard of quality, and facilitate the effort of upstreaming for those who want to make the effort to do so.

I do want to reiterate that unlike other projects that have forked from illumos, it is not my intent to divorce myself from the community -- rather I see this illumos-core as an experimental branch aimed at exploring new directions that I ultimately hope will be embraced by the wider illumos community some day; by doing this in a separate repository/branch/fork, illumos-core can drive towards these goals without getting mired in questions that would prevent progress on these goals within illumos-gate proper.

The focus here is on delivery, rather than on discussion.  (In fact, one of my taglines on social media has for many years been "Code first, questions later."  The illumos-core effort represents a return to that core value.)

Call for Participation


I'm also interested in having co-collaborators on this project.  The goals are large, and while I hope to achieve them someday even if I have to do it all myself, I'm certain that the project will move quite a lot faster with help.  Also, because of our lack of bureaucracy, I hope that illumos-core can be an easier path to integration than illumos-gate.  I just use a simple github pull-request for integration at present.

There is an opportunity for folks at all different technical levels to participate.  We need work that involves systems programming, but also there is work around documentation, research, shell scripting, test development and release engineering to be performed.  I'm happy to mentor folks who want to help out, based on their skill level.

And, of course, for folks who want to focus primarily on improving illumos-gate upstream, there is effort that could be spent to figure out what to cherry-pick and to do the various illumos-gate process wrangling steps to get those bits integrated.

Friday, October 17, 2014

Your language sucks...

As a result of work I've been doing for illumos, I've recently gotten re-engaged with internationalization, and the support for this in libc and localedef (I am the original author for our localedef.)

I've decided that human languages suck.  Some suck worse than others though, so I thought I'd write up a guide.  You can take this as "your language sucks if...", or perhaps a better view might be "your program sucks if you make assumptions this breaks..."

(Full disclosure, I'm spoiled.  I am a native speaker of English.  English is pretty awesome for data-processing, at least at the written level.  I'm not going to concern myself with questions about deeper issues like grammar, natural language recognition, speech synthesis, or recognition, automatic translation, etc.  Instead this is focused strictly on the most basic display and simple operations like collation (sorting), case conversion, and character classification.)

1. Too many code points. 

Some languages (from Eastern Asia) have way way too many code points.  There are so many that these languages can't actually fit into 16-bits all by themselves.  Yes, I'm saying that there are languages with over 65,000 characters in them!  This explosion means that generating data for languages results in intermediate lookup tables that are megabytes in size.  For Unicode, this impacts all languages.  The intermediate sources for the Unicode supported in illumos blow up to over 2GB when support for the additional code planes is included.

2. Your language requires me to write custom code for symbol names. 

Hangul Jamo, I'm looking at you.  Of all the languages in Unicode, only this one is so bizarre that it requires multiple lookup tables to determine the names of the characters, because the characters are made up of smaller bits of phonetic portions (vowels and consonants.)  It even has its own section in the basic conformance document for Unicode (section 3.12).  I don't speak Korean, but I had to learn about Jamo.

3. Your language's character set is continuing to evolve. 

Yes, that's Asia again (mostly China I think).   The rate at which new Asian characters are added rivals that of updates to the timezone database.  The approach your language uses is wrong!

4. Characters in your language are of multiple different cell widths. 

Again, this is mostly, but not exclusively, Asian languages.  Asian languages require 2 cells to display many of their characters.  But, to make matters far far worse, some times the number f code points used to represent a character is more than one, which means that the width of a character when displayed may be 0, 1, or 2 cells.   Worse, some languages have both half- and full-width forms for many common symbols.  Argh.

5. The width of the character depends on the context. 

Some widths depend on the encoding because of historical practice (Asia again!), but then you have composite characters as well.  For example, a Jamo vowel sound could in theory be displayed on its own.  But if it follows a leading consonant, then it changes the consonant character and they become a new character (at least to the human viewer).

6. Your language has unstable case conversions.

There are some evil ones here, and thankfully they are rare.  But some languages have case conversions which are not reversible!  Case itself is kind of silly, but this is just insane!  Armenian has a letter with this property, I believe.

7. Your language's collation order is context-dependent. 

(French, I'm looking at you!)  Some languages have sorting orders that depend not just on the character itself, but on the characters that precede or follow it.  Some of the rules are really hard.  The collation code required to deal with this generally is really really scary looking.

8. Your language has equivalent alternates (ligatures). 

German, your ß character, which stands in for "ss", is a poster child here.  This is a single code point, but for sorting it is equivalent to "ss".  This is just historical decoration, because it's "fancy".  Stop making my programming life hard.

9. Your language can't decide on a script. 

Some languages can be written in more than one script.  For example, Mongolian can be written using Mongolian script or Cyrillic.  But the winner (loser?) here is Serbian, which in some places uses both Latin and Cyrillic characters interchangeably! Pick a script already! I think the people who live like this are just schizophrenic.  (Given all the political nonsense surrounding language in these places, that's no real surprise.)

10. Your language has Titlecase. 

POSIX doesn't do Titlecase.  This happens because your language also uses ligatures instead of just allocating a separate cell and code point for each character.  Most people talk about titlecase used in a phrase or string of words.  But yes, titlecase can apply to a SINGLE CHARACTER.  For example, Dž is just such a character.

11. Your language doesn't use the same display / ordering we expect.

So some languages use right to left, which is backwards, but whatever.   Others, crazy ones (but maybe crazy smart, if you think about it) use back and forth bidirectional.  And still others use vertical ordering.  But the worst of them are those languages (Asia again, dammit!) where the orientation of text can change.  Worse, some cases even rotate individual characters, depending upon context (e.g. titles are rotated 90 degrees and placed on the right edge).  How did you ever figure out how to use a computer with this crazy stuff?

12. Your encoding collides control codes.

We use the first 32 or so character codes to mean special things for terminal control, etc.  If we can't use these, your language is going to suck over certain kinds of communication lines.

13. Your encoding uses conflicting values at ASCII code points.

ASCII is universal.  Why did you fight it?  But that's probably just me being mostly Anglo-centric / bigoted.

14. Your language encoding uses shift characters. 

(Code page, etc.)  Some East Asian languages used this hack in the old days.  Stateful encodings are JUST HORRIBLY BROKEN.   A given sequence of characters should not depend on some state value that was sent a long time earlier.

15. Your language encoding uses zero values in the middle of valid characters. 

Thankfully this doesn't happen with modern encodings in common use anymore.  (Or maybe I just have decided that I won't support any encoding system this busted.  Such an encoding is so broken that I just flat out refuse to work with it.)

Non-Broken Languages


So, there are some good examples of languages that are famously not broken.

a. English.  Written English has simple sorting rules, and a very simple character set.  Dipthongs are never ligatures.  This is so useful for data processing that I think it has had a great deal to do with why English is the common language for computer scientists around the world.  US-ASCII -- and English character set, is the "base" character set for Unicode, and pretty much all other encodings use ASCII encodings in the lower 7 bits.

b. Russian.  (And likely others that use Cyrillic, but not all of them!)  Russian has a very simple alphabet, strictly phonetic.  The number of characters is small, there are no composite characters, and no special sorting rules.  Hmm... I seem to recall that Russia (Soviet era) had a pretty robust computing industry.  And these days Russians mostly own the Internet, right?  Coincidence?  Or maybe they just don't have to waste a lot of time fighting with the language just to get stuff done?

I think there are probably others.  (At a glance, Geoergian looks pretty straight-forward.   I suspect that there are languages using both Cyrillic and Latin character sets that are sane.  Ethiopic actually looks pretty simple and sane too.  (Again, just from a text processing standpoint.)

But sadly, the vast majority of natural languages have written forms & rules that completely and utterly suck for text processing.

Sunday, October 12, 2014

My Problem with Feminism

I'm going to say some things here that may be controversial.  Certainly that headline is.  But please, bear with me, and read this before you judge too harshly.

As another writer said, 2014 has been a terrible year for women in tech.  (Whether in the industry, or in gaming.)  Arguably, this is not a new thing, but rather events are reaching a head.  Women (some at any rate) are being more vocal, and awareness of women's issues is up.  On the face of it, this should be a good thing.

And yet, we have incredible conflict between women and men.  And this is at the heart of my problem with "Feminism".

The F-Word


Don't get me wrong.  I strongly believe that women should be treated fairly and with respect; in the professional place they should receive the same level of professional respect -- and compensation! -- as their male counterparts can expect.  I believe this passionately -- as a nerd, I prefer to judge people on the merits of their work, rather than on their race, creed, gender, or sexual preference.  A similar principle applies to gaming -- after all, how do you really know the gender of the player on the other side of the MMO?  Does it even matter?  When did gaming become a venue for channeling hate instead of fun?

The problem with "feminism" is that instead of repairing inequality and trying to bring men and women closer together, so much of it seems to be divisive.  The very word itself basically suggests a gender based conflict, and I think this, as well as much of the recent approach, is counterproductive.

Instead of calling attention to inequalities and improper behaviors (lets face it, nobody wants to deal with sexual harassment, discrimination, or some of the very much worse behavior that a few terribly bad actors are guilty of), we've become focused on gender bias and "fixing" gender bias as a goal in and of itself, rather than instead focusing on fair and equal treatment for all.

Every day I'm inundated with tweets and Facebook postings extolling the terrible plight of women at the expense of men.  Many of these posts seem intended to make me either angry at men, or ashamed of being one.  This basically drives a wedge between people, even unconsciously, to the point that it has become impossible to avoid being a soldier on one side or the other of this war.  And don't get me wrong, it has indeed degenerated to a total war.

I don't think this is what most feminists or their advocates really want.  (Though, I think it is what some of them want.  The side of feminism has its bad actors who thrive on conflict just as much as the other side has.  Extremism is gender and color and religion blind, as we've ample evidence of.)

I think one thing that advocates for women in tech can do, is to pick a different term, and a different way of stating their goals, and perhaps a different approach.  I think we've reached the critical mass necessary for awareness, so the constant tweets about how terrible it is to be a woman are no longer helpful.

I'm not sure what "term" should replace feminism -- in the workplace I'd suggest "professionalism".  After all everyone wants to be treated professionally, not just women.  (Btw, I'd say that in the gaming community, the value should be "sportsmanship".  Sadly some will see that word is gender biased, but I don't ascribe to the notion that we have to completely change our language in order to be more politically correct.  You know what I mean.)

Likewise, instead of dog piling on the one person (as I'm sure will happen in response to this post) on someone who doesn't immediately appear to support the feminist agenda, perhaps a little more tolerance, and education should be used in the approach.  Focus should, IMO, be on public praise for the parties who are working to make conditions better.

Educate instead of punish.  Make allies instead of enemies.

Salary Gap


The salary gap issue that was raised recently by Microsoft is another case in point.

I don't agree with Satya Nadella's comments saying that women should not ask for raises, but I think many women are nearly as likely to get a raise upon requesting one as a man of similar accomplishments.  (Yes, it would be better if this statement could have been said without "nearly".)   Far too few women feel comfortable asking for a merit based raise in the first place -- that is something that should change. But using race or gender as a bias to demand pay increases is a recipe for further division.  Indeed, men may begin to wonder if women are being compensated unfairly because they are women, but in the reverse direction. 

Likewise, bringing up discrimination in a salary discussion puts the other party on the defensive.  It presumes to imply prior wrong-doing.  This may be the case, but it may well not be.  After all, I've known many men that were under compensated simply because they sold themselves short, or were not comfortable asking for more money.   Why look for a fight when there isn't one?  (I suspect this is what Satya was really trying to get at.)

None of this helps the cause of "professionalism", and probably not the cause of "feminism".

Average tech salary figures are easily obtainable.  If a worker, man or woman, feels under compensated -- for any reason -- then they should take it to his employer and ask for a correction.  But to presume that the reason is gender, starts the conversation from a point of conflict.

Far far better is to demand far pay based on work performance and merit, relative to industry norms as appropriate.   If an employer won't compensate fairly, just leave.  There is no shortage of tech jobs in the industry.  If you're a woman, maybe look for jobs at companies that employ (and successfully retain) women.  Ask the people who work at a prospective employer about conditions, etc.  That's true for minorities too!  Ultimately, an employer who discriminates will find itself at a severe competitive advantage, as both the discriminated-against parties, and their allies refuse to do business with them.

An employer is not obligated to pay you "more" because of your gender.  But they must also not pay you less because of gender.  And yet every company will generally try to pay as little as they think they can get away with.  So don't let them -- but keep discrimination out of the conversation unless there is really compelling proof of wrong doing.  (And if there is such evidence, I'd recommend looking elsewhere, and possibly explore stronger legal measures.)

And yes, I strongly strongly believe that most men feel as I do.  They support the notion that everyone should be treated equally and professionally, and would like to stamp out sexism in the workplace, but many of us are starting to show symptoms of battle fatigue, and even more of us just don't want to be involved in a conflict at all.   Frankly, I think a lot of us are annoyed at feminist attempts to draw us into the conflict, even though we do support many of the stated goals of equal pay, fair treatment, etc. etc.

Closing Thoughts

As for me, I support the plight of women who find themselves discriminated against based on their gender, and I would like to see more women in my industry.  And I've put my money where my mouth is. 

But at the same time, you won't find me supporting "feminism".  I want to heal the rift, and work with awesome people -- and I happen to believe at least half of the awesome people in the world are of a different gender than I am.  Why would I want to alienate them?

I happen to believe that many well meaning people of many causes damage their cause by basically forcing people to deal with their "diversity" first, instead of of being able to deal with people as people on their own merit.  Its so much harder to appreciate a person on her own merits, when at least half of what she is saying is that she's unfairly treated because of gender, race, sexual preference, etc.  This true for everyone.  Show me how you're excellent, and I promise to appreciate you for your awesomeness, and to treat you fairly and with the same respect I would for anyone of my own gender/race/sexual preference.

You are awesome because of your accomplishments/innovations/contributions, not because of your gender or race or sexual preference.

But, if you won't let me look past your race/gender/etc. identity, then please don't be offended if I don't see anything else.  If you want to be treated like a "person", then let me see the person instead of just some classification in an equal opportunity survey.

Thursday, October 2, 2014

Supporting Women in Open Source

Please have a look at Sage Weil's blog post on supporting the Ada Initiative, which supports women in open source development.

Sage is sponsoring an $8192 matching grant, to support women in open source development of open storage technology.

You may have heard my talk recently, where I expressed that there have been no female contributions to illumos (that includes ZFS by the way!)  This is kind of a tragedy; intelligence and creativity of at least half the population are simply not represented here, and we are worser for it.

If you want to try to do something about it, heres a small thing.  There's a week remaining to do so, so I encourage folks to step up.  ($3392 has already been granted.)

I'm making a donation myself, if you think supporting more women in open source is a worthwhile cause, please join me!

Sunday, September 7, 2014

Modernizing "less"

I have just spent an all-nighter doing something I didn't expect to do.

I've "modernized" less(1).  (That link is to the changeset.)

First off, let me explain the motivation.  We need a pager for illumos that can meet the requirements for POSIX IEEE 2003.1-2008 more(1).  We have a suitable pager (barely), in closed source form only, delivered into /usr/xpg4/bin/more.  We have an open source /usr/bin/more, but it is incredibly limited, hearkening back to the days of printed hard copy I think.  (It even has Microsoft copyrights in it!)

So closed source is kind of a no go for me.

less(1) looks attractive.  It's widely used, and has been used to fill in for more(1) to achieve POSIX compliance on other systems (such as MacOS X.)

So I started by unpacking it into our tree, and trying to get it to work with an illumos build system.

That's when I discovered the crazy contortions autoconf was doing that basically wound up leaving it with just legacy BSD termcap.   Ewww.   I wanted it to use X/Open Curses.

When I started trying to do that, I found that there were severe crasher bugs in less, involving the way it uses scratch buffer space.  I started trying to debug just that problem, but pretty soon the effort mushroomed.

Legacy less supports all kinds of crufty and ancient systems.   Systems like MS-DOS (actually many different versions with different compiler options!) and Ultrix and OS/2 and OS9, and OSK, etc.  In fact, it apparently had to support systems where the C preprocessor didn't understand #elif, so the #ifdef maze was truly nightmarish.  The code is K&R style C even.

I decided it was high time to modernize this application for POSIX systems.  So I went ahead and did a sweeping update.  In the process I ditched thousands of lines of code (the screen handling bits in screen.c are less than half as big as they were).

So, now it:


  • Speaks terminfo (X/Open Curses) instead of ancient BSD termcap
  • Uses glob(3C) instead of a hack involving the shell and a helper program (lessecho, which I've removed from my tree.)
  • Functions properly as /usr/bin/more, both with and without -e (even on broken xterms)
  • Is fully ANSI C (or ISO C, if you prefer)
  • Passes illumos' cstyle code style checks
  • Is lint(1) clean


There is more work to do in the future if someone wants to.  Here are the ideas for the future:


  • Internationalization.  This is a pretty easy task involving gettext().
  • Make less use getopt() instead of its byzantine option parser (it needed that for PC operating systems.  We don't need or want this complexity on POSIX.)
  • Fix its character set handling so it can use the mbstring and wcstring routines in the platform instead of relying on it's own implementation of UTF-8.  (This would make it support other multibyte locales.)
  • Make it support port events instead of sleeping when acting in "tail -f" mode.


If someone wants to pick up any of this work, let me know.  I'm happy to advise.  Oh, and this isn't in illumos proper yet.  It's unclear when, if ever, it will get into illumos -- I expect a lot of grief from people who think I shouldn't have forked this project, and I'm not interested in having  a battle with them.  The upstream has to be a crazy maze because of the platforms it has to support.  We can do better, and I think this was a worthwhile case.  (In any event, I now know quite a lot more about less internals than I did before.  Not that this is a good thing.)