Tuesday, April 14, 2015

vtprint - blast from the past

I recently decided to have a look back at some old stuff I wrote.  Blast from the past stuff.  One of the things I decided to look at was the very first open source program I wrote -- something called vtprint.

vtprint is a tool that borrowed ideas stolen from the pine mail program.  This comes from the days of serial dialups, and legacy (Kermit or ProComm for those who remember such things) serial connectivity over 14.4K modems.  Printing a file on a remote server was hard back then; you could transfer the file using xmodem using rx(1), or its "newer" variants, rb(1) or rz(1), but this was awkward.  It turns out that most physical terminals had support for an escape sequence that would start hardcopy to a locally attached printer, and another that would stop it.  And indeed, many terminal emulators have the same support.  (I added support for this myself to the rxvt(1) terminal emulator back in the mid 90's, so any emulators derived from rxvt inherit this support as well.)

So vtprint, which in retrospect could have been written in a few lines of shell code, is a C program.  It supports configuration via a custom file called "vtprintcap", that can provide information about different $TERM values and the escape sequences they use.  So for example, TVI925 terminals use a different escape sequence than VT100 or VT220 terminals.

The first version of it, 1.0, is lost in time, and was written back in 1991 or 1992 while I was working as a "Computer Consultant" (fancy name for BOFH, though I also did a lot of programming for the school's Windows 3.1 network and Novell NetWare 3.12 server) at the Haas School of Business at UC Berkeley.  (I was a Chemical Engineering student there, and I flunked out in thermodynamics -- I still blame Gibbs Free Energy ΔG as a concept I never truly could get my mind around.  It took a couple of years before I finally gave up fighting "hard engineering" and accepted my true calling as a software engineer.)

Anyway, I transferred to SDSU into the Computer Science department.  While there, back in 1993, I updated vtprint significantly, and that is the release that lives on at SourceForge.

Today I imported that project to github.  (And let me tell you, it took some hoops to do this.  Apparently all the old CVS projects that were going to have converted to CVS are supposed to already have done so, because at this point even the conversion tools are mostly crufty and broken.  Maybe I'll post what I did there.)  Most of the files indicate an age of 11 years ago.  That's because 11 years ago I imported the source into CVS for the benefit of SourceForge.  The actual files date back to 1994, and you can even see my old email address -- which hasn't worked for a couple of decades, in the files.

So, I gave vtprint a whirl today for the first time in years:

garrett@hipster{4}> ./vtprint README
*** vtprint (v2.0.2) ***
Copyright 1993-1994, Garrett D'AmoreLast revised October 25, 1994.
NO WARRANTY!  Use "vtprint -w" for info.Freely redistributable.  Use "vtprint -l" for info.
vtprint: Can't open /usr/local/lib/vtprint/vtprintcap, using builtin codes.vtprint: Using <stdout> for output.vtprint: Output flags: formfeedvtprint: Printed README.vtprint: Successfully printed 1 file (1 specified).

Sadly, MacOS X Terminal.app does not appear to emulate the escape sequences properly.  But iTerm2 does so very nicely.  It even generates a nice print preview using the standard MacOS X print dialog.  Sadly, it does not pass through PostScript unmolested, but for printing ASCII files it works beautifully.

I think that I'm going to start using vtprint again, when I need to print a file located on a virtual machine, or over a connection that only offers SSH (such as a secured access machine).  With firewalled networks, I suspect that this program has new usefulness.  I'm also switching from Terminal.app (as many people have suggested) to iTerm2.

Sunday, February 22, 2015

IPv6 and IPv4 name resolution with Go

As part of a work-related project, I'm writing code that needs to resolve DNS names using Go, on illumos.

While doing this work, I noticed a very surprising thing.  When a host has both IPv6 and IPv4 addresses associated with a name (such as localhost), Go prefers to resolve to the IPv4 version of the name, unless one has asked specifically for v6 names.

This flies in the fact of existing practice on illumos & Solaris systems, where resolving a name tends to give an IPv6 result, assuming that any IPv6 address is plumbed on the system.  (And on modern installations, that is the default -- at least the loopback interface of ::1 is always plumbed by default.  And not only that, but services listening on that address will automatically serve up both v6 and v4 clients that connect on either ::1 or 127.0.0.1.)

The rationale for this logic is buried in the Go net/ipsock.go file, in comments for the function firstFavoriteAddr ():
    76			// We'll take any IP address, but since the dialing
    77			// code does not yet try multiple addresses
    78			// effectively, prefer to use an IPv4 address if
    79			// possible. This is especially relevant if localhost
    80			// resolves to [ipv6-localhost, ipv4-localhost]. Too
    81			// much code assumes localhost == ipv4-localhost.
This is a really surprising result.  If you want to get IPv6 names by default, with Go, you could use the net.ResolveIPAddr() (or ResolveTCPAddr() or ResolveUDPAddr()) functions with the network type of "ip6", "tcp6", or "udp6" first.  Then if that resolution fails, you can try the standard versions, or the v4 specific versions (doing the latter is probably slightly more efficient.)  Here's what that code looks like:
        name := "localhost"

        // First we try IPv6.  Note that we *hope* this fails if the host
        // stack does not actually support IPv6.
        err, ip := net.ResolveIP("ip6", name)
        if err != nil {
                // IPv6 not found, maybe IPv4?
                err, ip = net.ResolveIP("ip4", name)
        }
However, this choice turns out to also be evil, because while ::1 often works locally as an IPv6 address and is functional, other addresses, for example www.google.com, will resolve to IPv6 addresses which will not work unless you have a clean IPv6 path all the way through.  For example, the above gives me this for www.google.com: 2607:f8b0:4007:804::1010, but if I try to telnet to it, it won't work -- no route to host (of course, because I don't have an IPv6 path to the Internet, both my home gateway and my ISP are IPv4 only.)


Its kind of a sad that the Go people felt that they had to make this choice -- at some level it robs the choice from the administrator, and encourages the existing broken code to remain so.  I'm not sure what the other systems use, but at least on illumos, we have a stack that understands the situation, and resolves optimally for the given the situation of the user.  Sadly, Go shoves that intelligence aside, and uses its own overrides.

One moral of the story here is -- always use either explicit v4 or v6 addresses if you care, or ensure that your names resolve properly.

Wednesday, February 11, 2015

Rise of mangos

What is mangos?


Those of you who follow me may have heard about this project I've created called mangos.

Mangos is a lightweight messaging system designed to be wire compatible with nanomsg, but is implemented entirely in Go.

There is a very nice write up of mangos by Tyler Treat, which might help explain some things.

Recent Activity


As a consequence of a few things, the last two weeks has seen a substantial rise of use of mangos.

First off, there was Tyler's excellent article.  (By the way, he's done a series comparing and contrasting other messaging systems -- highly recommended reading.)

Second, mangos got mentioned on Hacker News.  That drew a large number visitors to my github repo.

Then another open source project, Goq, switched from using libnanomsg to mangos, using the compatibility shim I provided for such use.  As a consequence of that work, several bugs were identified, and subsequently squashed.

The upshot of all that is that I saw the number unique visitors sky rocket.  On Saturday Feb 7, there were over 2500 unique visitors to the github page, and 29 unique people took clones.  Sunday it tapered sharply to just over 1k visitors, and today there were only 7.  Peaks rarely get sharper than that.

Improvements


Over the past week or so I've made a large number of changes and improvements.  Recently, mangos has grown support for RFC 6455 (websocket), including websocket over TLS, and has had numerous internal improvements.

Some of these changes have broken API.  If you use mangos, I'm sorry about the breakage -- please let me know if you're hurt by this.  (I have created tagged releases for v1.0.0 and v1.1.0 in an attempt to mitigate the risk, but tip still has some interesting changes in it.)

Unlike libnanomsg, mangos (tip only) can notify you when a connection is added or removed, and you can access interesting information about the connection.  This is in the Port API.

Futures


We are using mangos internally at Lucera, and I know now of several cases of production use.  This is kind of scary at one level, since I wrote this originally as a hobby project about a year ago (to learn Go.)  But it has become useful -- frankly extending mangos is far far more pleasurable than working in the C libnanomsg implementation -- a lot of this is thanks to Go which is utterly pleasurable to work in (no matter how bad the guts may be reputed to be).  Being able to write a new TLS transport, or even websocket, in the course of an afternoon or two (actually for TLS it was more like an hour), is really nice.

I'm hoping that more people will find it useful, and that folks who want to experiment with the underlying messaging patterns may find it easier to work with than the C code.  Ideally, there will be more collaborators here, as we start exploring new directions for this stuff.

In the meantime, I'm going to continue to work to improve and extend mangos, because its become one of the tools at my day job.  Its nice when work and pleasure come together!

Thursday, November 13, 2014

A better illumos...

If you follow illumos very closely, you may already know some of this.

A New Fork


Several months ago, I forked illumos-gate (the primary source code repository for the kernel and system components of illumos) into illumos-core.

I had started upstreaming my work from illumos-core into illumos-gate.  I've since ceased that effort, largely because I simply have no time for the various arguments that my work often generates.  I think this is largely because my vision for illumos is somewhat different from that of other folks, and sadly illumos proper lacks anything resembling a guiding vision now, which means that only entirely non-contentious changes can get integrated into illumos.

However, I still want to proceed apace with illumos-core, because I believe that work has real value, and I firmly believe that my vision for illumos is the one that will lead to greater adoption by users, and by distributors as well, since much of what I'm trying to achieve in illumos-gate is aimed at reducing barriers to adoption and to developers both of illumos itself and of systems that want to build on top of or integrate illumos.  (An example of reducing barriers to adoption -- I recently implemented a BSD compatible flock() within libc, which is sometimes used by applications developed for BSD or Linux.)

Relationship to Upstream


I do also invite other parties to cherry-pick from illumos-core into illumos-gate.  I suspect that a large number of the enhancements I've made, such as the support for the fexecve() function specified by POSIX 2008, are likely to be more widely useful.  Within illumos-core, I want to retain a high standard of quality, and facilitate the effort of upstreaming for those who want to make the effort to do so.

I do want to reiterate that unlike other projects that have forked from illumos, it is not my intent to divorce myself from the community -- rather I see this illumos-core as an experimental branch aimed at exploring new directions that I ultimately hope will be embraced by the wider illumos community some day; by doing this in a separate repository/branch/fork, illumos-core can drive towards these goals without getting mired in questions that would prevent progress on these goals within illumos-gate proper.

The focus here is on delivery, rather than on discussion.  (In fact, one of my taglines on social media has for many years been "Code first, questions later."  The illumos-core effort represents a return to that core value.)

Call for Participation


I'm also interested in having co-collaborators on this project.  The goals are large, and while I hope to achieve them someday even if I have to do it all myself, I'm certain that the project will move quite a lot faster with help.  Also, because of our lack of bureaucracy, I hope that illumos-core can be an easier path to integration than illumos-gate.  I just use a simple github pull-request for integration at present.

There is an opportunity for folks at all different technical levels to participate.  We need work that involves systems programming, but also there is work around documentation, research, shell scripting, test development and release engineering to be performed.  I'm happy to mentor folks who want to help out, based on their skill level.

And, of course, for folks who want to focus primarily on improving illumos-gate upstream, there is effort that could be spent to figure out what to cherry-pick and to do the various illumos-gate process wrangling steps to get those bits integrated.

Friday, October 17, 2014

Your language sucks...

As a result of work I've been doing for illumos, I've recently gotten re-engaged with internationalization, and the support for this in libc and localedef (I am the original author for our localedef.)

I've decided that human languages suck.  Some suck worse than others though, so I thought I'd write up a guide.  You can take this as "your language sucks if...", or perhaps a better view might be "your program sucks if you make assumptions this breaks..."

(Full disclosure, I'm spoiled.  I am a native speaker of English.  English is pretty awesome for data-processing, at least at the written level.  I'm not going to concern myself with questions about deeper issues like grammar, natural language recognition, speech synthesis, or recognition, automatic translation, etc.  Instead this is focused strictly on the most basic display and simple operations like collation (sorting), case conversion, and character classification.)

1. Too many code points. 

Some languages (from Eastern Asia) have way way too many code points.  There are so many that these languages can't actually fit into 16-bits all by themselves.  Yes, I'm saying that there are languages with over 65,000 characters in them!  This explosion means that generating data for languages results in intermediate lookup tables that are megabytes in size.  For Unicode, this impacts all languages.  The intermediate sources for the Unicode supported in illumos blow up to over 2GB when support for the additional code planes is included.

2. Your language requires me to write custom code for symbol names. 

Hangul Jamo, I'm looking at you.  Of all the languages in Unicode, only this one is so bizarre that it requires multiple lookup tables to determine the names of the characters, because the characters are made up of smaller bits of phonetic portions (vowels and consonants.)  It even has its own section in the basic conformance document for Unicode (section 3.12).  I don't speak Korean, but I had to learn about Jamo.

3. Your language's character set is continuing to evolve. 

Yes, that's Asia again (mostly China I think).   The rate at which new Asian characters are added rivals that of updates to the timezone database.  The approach your language uses is wrong!

4. Characters in your language are of multiple different cell widths. 

Again, this is mostly, but not exclusively, Asian languages.  Asian languages require 2 cells to display many of their characters.  But, to make matters far far worse, some times the number f code points used to represent a character is more than one, which means that the width of a character when displayed may be 0, 1, or 2 cells.   Worse, some languages have both half- and full-width forms for many common symbols.  Argh.

5. The width of the character depends on the context. 

Some widths depend on the encoding because of historical practice (Asia again!), but then you have composite characters as well.  For example, a Jamo vowel sound could in theory be displayed on its own.  But if it follows a leading consonant, then it changes the consonant character and they become a new character (at least to the human viewer).

6. Your language has unstable case conversions.

There are some evil ones here, and thankfully they are rare.  But some languages have case conversions which are not reversible!  Case itself is kind of silly, but this is just insane!  Armenian has a letter with this property, I believe.

7. Your language's collation order is context-dependent. 

(French, I'm looking at you!)  Some languages have sorting orders that depend not just on the character itself, but on the characters that precede or follow it.  Some of the rules are really hard.  The collation code required to deal with this generally is really really scary looking.

8. Your language has equivalent alternates (ligatures). 

German, your ß character, which stands in for "ss", is a poster child here.  This is a single code point, but for sorting it is equivalent to "ss".  This is just historical decoration, because it's "fancy".  Stop making my programming life hard.

9. Your language can't decide on a script. 

Some languages can be written in more than one script.  For example, Mongolian can be written using Mongolian script or Cyrillic.  But the winner (loser?) here is Serbian, which in some places uses both Latin and Cyrillic characters interchangeably! Pick a script already! I think the people who live like this are just schizophrenic.  (Given all the political nonsense surrounding language in these places, that's no real surprise.)

10. Your language has Titlecase. 

POSIX doesn't do Titlecase.  This happens because your language also uses ligatures instead of just allocating a separate cell and code point for each character.  Most people talk about titlecase used in a phrase or string of words.  But yes, titlecase can apply to a SINGLE CHARACTER.  For example, Dž is just such a character.

11. Your language doesn't use the same display / ordering we expect.

So some languages use right to left, which is backwards, but whatever.   Others, crazy ones (but maybe crazy smart, if you think about it) use back and forth bidirectional.  And still others use vertical ordering.  But the worst of them are those languages (Asia again, dammit!) where the orientation of text can change.  Worse, some cases even rotate individual characters, depending upon context (e.g. titles are rotated 90 degrees and placed on the right edge).  How did you ever figure out how to use a computer with this crazy stuff?

12. Your encoding collides control codes.

We use the first 32 or so character codes to mean special things for terminal control, etc.  If we can't use these, your language is going to suck over certain kinds of communication lines.

13. Your encoding uses conflicting values at ASCII code points.

ASCII is universal.  Why did you fight it?  But that's probably just me being mostly Anglo-centric / bigoted.

14. Your language encoding uses shift characters. 

(Code page, etc.)  Some East Asian languages used this hack in the old days.  Stateful encodings are JUST HORRIBLY BROKEN.   A given sequence of characters should not depend on some state value that was sent a long time earlier.

15. Your language encoding uses zero values in the middle of valid characters. 

Thankfully this doesn't happen with modern encodings in common use anymore.  (Or maybe I just have decided that I won't support any encoding system this busted.  Such an encoding is so broken that I just flat out refuse to work with it.)

Non-Broken Languages


So, there are some good examples of languages that are famously not broken.

a. English.  Written English has simple sorting rules, and a very simple character set.  Dipthongs are never ligatures.  This is so useful for data processing that I think it has had a great deal to do with why English is the common language for computer scientists around the world.  US-ASCII -- and English character set, is the "base" character set for Unicode, and pretty much all other encodings use ASCII encodings in the lower 7 bits.

b. Russian.  (And likely others that use Cyrillic, but not all of them!)  Russian has a very simple alphabet, strictly phonetic.  The number of characters is small, there are no composite characters, and no special sorting rules.  Hmm... I seem to recall that Russia (Soviet era) had a pretty robust computing industry.  And these days Russians mostly own the Internet, right?  Coincidence?  Or maybe they just don't have to waste a lot of time fighting with the language just to get stuff done?

I think there are probably others.  (At a glance, Geoergian looks pretty straight-forward.   I suspect that there are languages using both Cyrillic and Latin character sets that are sane.  Ethiopic actually looks pretty simple and sane too.  (Again, just from a text processing standpoint.)

But sadly, the vast majority of natural languages have written forms & rules that completely and utterly suck for text processing.

Sunday, October 12, 2014

My Problem with Feminism

I'm going to say some things here that may be controversial.  Certainly that headline is.  But please, bear with me, and read this before you judge too harshly.

As another writer said, 2014 has been a terrible year for women in tech.  (Whether in the industry, or in gaming.)  Arguably, this is not a new thing, but rather events are reaching a head.  Women (some at any rate) are being more vocal, and awareness of women's issues is up.  On the face of it, this should be a good thing.

And yet, we have incredible conflict between women and men.  And this is at the heart of my problem with "Feminism".

The F-Word


Don't get me wrong.  I strongly believe that women should be treated fairly and with respect; in the professional place they should receive the same level of professional respect -- and compensation! -- as their male counterparts can expect.  I believe this passionately -- as a nerd, I prefer to judge people on the merits of their work, rather than on their race, creed, gender, or sexual preference.  A similar principle applies to gaming -- after all, how do you really know the gender of the player on the other side of the MMO?  Does it even matter?  When did gaming become a venue for channeling hate instead of fun?

The problem with "feminism" is that instead of repairing inequality and trying to bring men and women closer together, so much of it seems to be divisive.  The very word itself basically suggests a gender based conflict, and I think this, as well as much of the recent approach, is counterproductive.

Instead of calling attention to inequalities and improper behaviors (lets face it, nobody wants to deal with sexual harassment, discrimination, or some of the very much worse behavior that a few terribly bad actors are guilty of), we've become focused on gender bias and "fixing" gender bias as a goal in and of itself, rather than instead focusing on fair and equal treatment for all.

Every day I'm inundated with tweets and Facebook postings extolling the terrible plight of women at the expense of men.  Many of these posts seem intended to make me either angry at men, or ashamed of being one.  This basically drives a wedge between people, even unconsciously, to the point that it has become impossible to avoid being a soldier on one side or the other of this war.  And don't get me wrong, it has indeed degenerated to a total war.

I don't think this is what most feminists or their advocates really want.  (Though, I think it is what some of them want.  The side of feminism has its bad actors who thrive on conflict just as much as the other side has.  Extremism is gender and color and religion blind, as we've ample evidence of.)

I think one thing that advocates for women in tech can do, is to pick a different term, and a different way of stating their goals, and perhaps a different approach.  I think we've reached the critical mass necessary for awareness, so the constant tweets about how terrible it is to be a woman are no longer helpful.

I'm not sure what "term" should replace feminism -- in the workplace I'd suggest "professionalism".  After all everyone wants to be treated professionally, not just women.  (Btw, I'd say that in the gaming community, the value should be "sportsmanship".  Sadly some will see that word is gender biased, but I don't ascribe to the notion that we have to completely change our language in order to be more politically correct.  You know what I mean.)

Likewise, instead of dog piling on the one person (as I'm sure will happen in response to this post) on someone who doesn't immediately appear to support the feminist agenda, perhaps a little more tolerance, and education should be used in the approach.  Focus should, IMO, be on public praise for the parties who are working to make conditions better.

Educate instead of punish.  Make allies instead of enemies.

Salary Gap


The salary gap issue that was raised recently by Microsoft is another case in point.

I don't agree with Satya Nadella's comments saying that women should not ask for raises, but I think many women are nearly as likely to get a raise upon requesting one as a man of similar accomplishments.  (Yes, it would be better if this statement could have been said without "nearly".)   Far too few women feel comfortable asking for a merit based raise in the first place -- that is something that should change. But using race or gender as a bias to demand pay increases is a recipe for further division.  Indeed, men may begin to wonder if women are being compensated unfairly because they are women, but in the reverse direction. 

Likewise, bringing up discrimination in a salary discussion puts the other party on the defensive.  It presumes to imply prior wrong-doing.  This may be the case, but it may well not be.  After all, I've known many men that were under compensated simply because they sold themselves short, or were not comfortable asking for more money.   Why look for a fight when there isn't one?  (I suspect this is what Satya was really trying to get at.)

None of this helps the cause of "professionalism", and probably not the cause of "feminism".

Average tech salary figures are easily obtainable.  If a worker, man or woman, feels under compensated -- for any reason -- then they should take it to his employer and ask for a correction.  But to presume that the reason is gender, starts the conversation from a point of conflict.

Far far better is to demand far pay based on work performance and merit, relative to industry norms as appropriate.   If an employer won't compensate fairly, just leave.  There is no shortage of tech jobs in the industry.  If you're a woman, maybe look for jobs at companies that employ (and successfully retain) women.  Ask the people who work at a prospective employer about conditions, etc.  That's true for minorities too!  Ultimately, an employer who discriminates will find itself at a severe competitive advantage, as both the discriminated-against parties, and their allies refuse to do business with them.

An employer is not obligated to pay you "more" because of your gender.  But they must also not pay you less because of gender.  And yet every company will generally try to pay as little as they think they can get away with.  So don't let them -- but keep discrimination out of the conversation unless there is really compelling proof of wrong doing.  (And if there is such evidence, I'd recommend looking elsewhere, and possibly explore stronger legal measures.)

And yes, I strongly strongly believe that most men feel as I do.  They support the notion that everyone should be treated equally and professionally, and would like to stamp out sexism in the workplace, but many of us are starting to show symptoms of battle fatigue, and even more of us just don't want to be involved in a conflict at all.   Frankly, I think a lot of us are annoyed at feminist attempts to draw us into the conflict, even though we do support many of the stated goals of equal pay, fair treatment, etc. etc.

Closing Thoughts

As for me, I support the plight of women who find themselves discriminated against based on their gender, and I would like to see more women in my industry.  And I've put my money where my mouth is. 

But at the same time, you won't find me supporting "feminism".  I want to heal the rift, and work with awesome people -- and I happen to believe at least half of the awesome people in the world are of a different gender than I am.  Why would I want to alienate them?

I happen to believe that many well meaning people of many causes damage their cause by basically forcing people to deal with their "diversity" first, instead of of being able to deal with people as people on their own merit.  Its so much harder to appreciate a person on her own merits, when at least half of what she is saying is that she's unfairly treated because of gender, race, sexual preference, etc.  This true for everyone.  Show me how you're excellent, and I promise to appreciate you for your awesomeness, and to treat you fairly and with the same respect I would for anyone of my own gender/race/sexual preference.

You are awesome because of your accomplishments/innovations/contributions, not because of your gender or race or sexual preference.

But, if you won't let me look past your race/gender/etc. identity, then please don't be offended if I don't see anything else.  If you want to be treated like a "person", then let me see the person instead of just some classification in an equal opportunity survey.

Thursday, October 2, 2014

Supporting Women in Open Source

Please have a look at Sage Weil's blog post on supporting the Ada Initiative, which supports women in open source development.

Sage is sponsoring an $8192 matching grant, to support women in open source development of open storage technology.

You may have heard my talk recently, where I expressed that there have been no female contributions to illumos (that includes ZFS by the way!)  This is kind of a tragedy; intelligence and creativity of at least half the population are simply not represented here, and we are worser for it.

If you want to try to do something about it, heres a small thing.  There's a week remaining to do so, so I encourage folks to step up.  ($3392 has already been granted.)

I'm making a donation myself, if you think supporting more women in open source is a worthwhile cause, please join me!