Tuesday, May 27, 2014

illumos, identification, and versions

Recently, there has been a bit of a debate on the illumos mailing lists, beginning I suppose with an initial proposal I made concerning the output from the uname(1) command on illumos, which today, when presented with "-s -r" as options, emits "SunOS 5.11".

A lot of the debate centers on various ways that software can or cannot identify the platform it is running on, and use that information to make programmatic decisions about how to operate. Most often these are decisions made at compile time, by tools such as autoconf and automake.

Clearly, it would be better for software not to rely on uname for programmatic uses, and detractors of my original proposal are correct that even in the Linux community, the value from uname cannot be relied upon for such use.  There are indeed better mechanisms to use, such as sysconf(3c), or the ZFS feature flags, to determine individual capabilities.  Indeed, the GNU autotools contains many individual tests for such things, and arguably discourages the use of uname except as a last resort.

Yet there can be no question that there are a number of packages that do make such use.  And changes to the output from uname become risky to such packages.

But perversely, not changing the output from uname also creates risk for such packages, as the various incarnations of SunOS 5.11 become ever less like one another.  Indeed, illumos != SunOS, and uname has become something of a lie over the past 4 years or so.

Clearly, the focus for programmatic platform determination -- particularly for enabling features or behaviors, should be to move away from uname.  (Actually, changing uname may actually help package maintainers in identifying this questionable behavior as questionable, although there is no doubt that such a change would be disruptive to them.)

But all this debate completely misses the other major purpose of uname's output, which is to identify the platform to humans.  Be they administrators, or developers, or distributors.  There is no question in mind that illumos' failure to self identify, and to have a regular "release" schedule (for whatever a release really means in this regard) is harmful.

The distributors, such as Joyent (SmartOS), would prefer that people only identify their particular distribution, and I believe that much of the current argument from them stems from two primary factors.  First, they see additional effort in any change, with no direct benefit to them.  Second, in some ways these distributors are disinclined to emphasize illumos itself.   Some of the messages sent (either over IRC or in the email thread) clearly portray some resentment towards the rest of the illumos ecosystem, especially some of the niche players), as well as a very heavy handed approach by one commercial concern towards the rest of the ecosystem.

Clearly, this flies in the spirit of community cooperation in which illumos was founded.  illumos should not be a slave to any single commercial concern, and nobody should be able to exercise a unilateral veto over illumos, or shut down conversations with a single "thermonuclear" post.

So, without regard to the merits, or lack thereof, of changing uname's output, I'm quite certain, with sufficient clear evidence of my own gathering, that illumos absolutely needs to have a regularly scheduled "releases" of illumos, with humanly understandable release identifiers.  The lack of both the releases, and the identifiers to go with them, hinders meaningful reviews, hurts distributors (particularly smaller ones), and makes support for the platform harder, particularly when the questions of support are considered across distributions.  All of these hinder adoption of the illumos platform; clearly an undesirable outcome.

Some would argue that we could use git tags for the identifiers.  From a programmatic standpoint, these would be easy to collect.  Although they have problems as well (particularly for distributions which neither use git, or use a private fork that doesn't preserve our git versions), there are worse problems.

Specifically, a humans aren't meant to derive meaning from something like "e3de96f25bd2ea4282eea2d1a86c1bebac8950cb".   While Neo could understand this, most of use merely mortal individuals simply can't understand such tags.  Worse, there is no implied sequencing here.  Is "e3de96f25bd2ea4282eea2d1a86c1bebac8950cb" newer than "d1007364f5b14efdd7d6ba27aa458669a6365d48" ?  You can't do a meaningful comparison without examining the actual git history.

This makes it hard to guess whether a given running release has a bug integrated or not.  It makes it hard to have conversations about the platform.  It even makes it hard for independent reviewers of the platform  to identify anything meaningful about the platform in the context of reviewing a distribution.

Furthermore, when we talk about the platform, its useful for version numbers to convey more than just serial meaning.  In particular, version numbers help set expectations for developers.  Historically Solaris (or rather SunOS, from which illumos is derived), set those expectations in the form of stability guarantees.   A given library interface might be declared as Stable, or Evolving, (or Committed or Uncommitted), or Obsolete.  This was a way to convey to developers the relative risks of an interface (in terms of interface change), and it set some ground rules for rate of change.  Indeed, Solaris (and SunOS) relied upon a form of semantic versioning, and many of the debates in Sun's architectural leadership for Solaris (PSARC) revolved around these commitments.

Yet today, the illumos distributors seem over willing to throw that bit of our heritage into the waste bin.  A trend, I fear, which ultimately leads to chaos, and an increase in the difficulty of adoption by ISVs and developers.

illumos is one component -- albeit probably the most important by far -- of a distribution.  Like all components, it is useful to be able to determine and talk about it.  This is not noticeably different than Linux, Xorg, gnome, or pretty much any of the other systems which you are likely to find as part of a complete Ubuntu or RedHat distribution.  In fact, our situation is entirely analogous other than we combine our libc and some key utilities and commands with the kernel.

Technically, in Solaris parlance, illumos is a consolidation.  In fact, this distinction has alway been clear.  And historically the way the consolidation is identified is with the contents of the utsname structure, which is what is emitted by uname.  

Furthermore, when we talk about the platform, its useful for version numbers to convey more than just serial meaning

In summary, regardless of whether we feel uname should return illumos or not, there is a critical need, although one not necessarily agreed upon by certain commercial concerns, for the illumos platform to have a release number at a minimum, and this release number must be useful to convey meaningful information to end-users, developers, and distributors alike.  It would be useful if this release number were obtainable in the traditional fashion (uname), but its more important that the numbers convey meaning in the same way across distributions (which means packaging metadata cannot be used, at least not exclusively).

Sunday, April 6, 2014

SP protocols improved again!

Introduction


As a result of some investigations performed in response to my first performance tests for my SP implementation, I've made a bunch of changes to my code.

First off, I discovered that my code was rather racy.  When I started bumping up GOMAXPROCS, and and used the -race flag to go test, I found lots of issues. 

Second, there were failure scenarios where the performance fell off a cliff, as the code dropped messages, needed to retry, etc. 

I've made a lot of changes to fix the errors.  But, I've also made a major set of changes which enable a vastly better level of performance, particularly for throughput sensitive workloads. Note that to get these numbers, the application should "recycle" the Messages it uses (using a new Free() API... there is also a NewMessage() API to allocate from the cache), which will cache and recycle used buffers, greatly reducing the garbage collector workload.

Throughput


So, here are the new numbers for throughput, compared against my previous runs on the same hardware, including tests against the nanomsg reference itself.

Throughput Comparision
(Mb/s)
transportnanomsg 0.3betaold gdamore/spnew
(1 thread)
new
(2 threads)
new
(4 threads)
new
(8 threads)
inproc 4k432255516629775186548841
ipc 4k947023796176661550255040
tcp 4k974425153785427944114420
inproc 64k83904216154561835044b4431247077
ipc 64k389297831a48400651906447163506
tcp 64k309791259834994496085306453432

a I think this poor result is from retries or resubmits inside the old implementation.
b I cannot explain this dip; I think maybe unrelated activity or GC activity may be to blame

The biggest gains are with large frames (64K), although there are gains for the 4K size as well.  nanomsg still out performs for the 4K size, but with 64K my message caching changes pay dividends and my code actually beats nanomsg rather handily for the TCP and IPC cases.

I think for 4K, we're hurting due to inefficiencies in the Go TCP handling below my code.  My guess is that there is a higher per packet cost here, and that is what is killing us.  This may be true for the IPC case as well.  Still, these are very respectable numbers, and for some very real and useful workloads my implementation compares and even beats the reference.

The new code really shows some nice gains for concurrency, and makes good use of multiple CPU cores.

There are a few mysteries though.  Notes "a" and "b" point to two of them.  The third is that the IPC performance takes a dip when moving from 2 threads to 4.  It still significantly outperforms the TCP side though, and is still performing more than twice as fast as my first implementation, so I guess I shouldn't complain too much.

Latency


The latency has shown some marked improvements as well.  Here are new latency numbers.

Latency Comparision
(usec/op)
transportnanomsg 0.3betaold gdamore/spnew
(1 thread)
new
(2 threads)
new
(4 threads)
new
(8 threads)
inproc6.238.476.569.9311.011.2
ipc15.722.627.729.131.331.0
tcp24.850.541.042.742.942.9

All in all, the round trip times are reasonably respectable. I am especially proud of how close I've come within the best inproc time -- a mere 330 nsec separates the Go implementation from the nanomsg native C version.  When you factor in the heavy use of go routines, this is truly impressive.   To be honest, I suspect that most of those 330 nsec are actually lost in the extra data copy that my inproc implementation has to perform to simulate the "streaming" nature of real transports (i.e. data and headers are not separate on message ingress.)

There's a sad side to story as well.  TCP handling seems to be less than ideal in Go.  I'm guessing that some effort is done to use larger TCP windows, and Nagle may be at play here as well (I've not checked.) Even so, I've made a 20% improvement in latencies for TCP from my first pass.

The other really nice thing is near linear scalability when threads (via bumping GOMAXPROCS) are added.  There is very, very little contention in my implementation.  (I presume some underlying contention for the channels exists, but this seems to be on the order of only a usec or so.)  Programs that utilize multiple goroutines are likely to benefit well from this.

Conclusion


Simplifying the code to avoid certain indirection (extra passes through additional channels and goroutines), and adding a message pooling layer, have yielded enormous performance gains.  Go performs quite respectably in this messaging application, comparing favorably with a native C implementation.  It also benefits from additional concurrency.

One thing I really found was that it took some extra time to get my layering model correct.  I traded complexity in the core for some extra complexity in the Protocol implementations.  But this avoided a whole other round of context switches, and enormous complexity.  My use of linked lists, and the ugliest bits of mutex and channel synchronization around list-based queues, were removed.  While this means more work for protocol implementors, the reduction in overall complexity leads to marked performance and reliability gains.

I'm now looking forward to putting this code into production use.

Thursday, March 27, 2014

Names are Hard

So I've been thinking about naming for my pure Go implementation of nanomsg's SP protocols.

nanomsg is trademarked by the inventor of the protocols.  (He does seem to take a fairly loose stance with enforcement though -- since he advocates using names that are derived from nanomsg, as long as its clear that there is only one "nanomsg".)

Right now my implementation is known as "bitbucket.org/gdamore/sp".  While this works for code, it doesn't exactly roll off the tongue.  Its also a problem for folks wanting to write about this.  So the name can actually become a barrier to adoption.  Not good.

I suck at names.  After spending a day online with people, we came up with "illumos" for the other open source project I founded.  illumos has traction now, but even that name has problems.  (People want to spell it "illumOS", and they often mispronounce it as "illuminos"  (note there are no "n"'s in illumos).  And, worse, it turns out that the leading "i" is indistinguishable from the following "l's" -- like this: Illumos --  when used in many common san-serif fonts -- which is why I never capitalize illumos.  Its also had a profound impact on how I select fonts.  Good-bye Helvetica!)

go-nanomsg already exists, btw, but its a simple foreign-function binding, with a number of limitations, so I hope Go programmers will choose my version instead.

Anyway, I'm thinking of two options, but I'd like criticisms and better suggestions, because I need to fix this problem soon.

1. "gnanomsg" -- the "g" evokes "Go" (or possibly "Garrett" if I want to be narcissistic about it -- but I don't like vanity naming this way).  In pronouncing it, one could either use a silent "g" like "gnome" or "gnat", or to distinguish between "nanomsg" one could harden the "g" like in "growl".   The problem is that pronunciation can lead to confusion, and I really don't like that "g" can be mistaken to mean this is a GNU program, when it most emphatically is not a GNU.  Nor is it GPL'd, nor will it ever be.

2. "masago" -- this name distantly evokes "messaging ala go", is a real world word, and I happen to like sushi.  But it will be harder for people looking for nanomsg compatible layers to find my library.

I'm leaning towards the first.  Opinions from the community solicited.



Wednesday, March 26, 2014

Early performance numbers

I've added a benchmark tool to my Go implementation of nanomsg's SP protocols, along with the inproc transport, and I'll be pushing those changes rather shortly.

In the meantime, here's some interesting results:

Latency Comparision
(usec/op)
transport nanomsg 0.3beta gdamore/sp
inproc6.238.47
ipc15.722.6
tcp24.850.5


The numbers aren’t all that surprising.  Using go, I’m using non-native interfaces, and my use of several goroutines to manage concurrency probably creates a higher number of context switches per exchange.  I suspect I might find my stuff does a little better with lots and lots of servers hitting it, where I can make better use of multiple CPUs (though one could write a C program that used threads to achieve the same effect).

The story for throughput is a little less heartening though:


Throughput Comparision
(Mb/s)
transport message size nanomsg 0.3beta gdamore/sp
inproc4k43225551
ipc4k94702379
tcp4k97442515
inproc64k8390421615
ipc64k389297831 (?!?)
tcp64k3097912598

I didn't try larger sizes yet, this is just a quick sample test, not an exhaustive performance analysis.  What is interesting is that the ipc case for my code is consistently low.  It uses the same underlying transport to Go as TCP, but I guess maybe we are losing some TCP optimizations.  (Note that the TCP tests were performed using loopback, I don't really have 40GbE on my desktop Mac. :-)

I think my results may be worse than they would otherwise be, because I use the equivalent of NN_MSG to dynamically allocate each message as it arrives, whereas the nanomsg benchmarks use a preallocated buffer.   Right now I'm not exposing an API to use preallocated buffers (but I have considered it!  It does feel unnatural though, and more of a "benchmark special".)

That said, I'm not unhappy with these numbers.  Indeed, it seems that my code performs reasonably well given all the cards stacked against it.  (Extra allocations due to the API, extra context switches due to extra concurrency using channels and goroutines in Go, etc.)

A litte more details about the tests.

All test were performed using nanomsg 0.3beta, and my current Go 1.2 tree, running on my Mac running MacOS X 10.9.2, on 3.2 GHz Core i5.  The latency tests used full round trip timing using the REQ/REP topology, and a 111 byte message size.  The throughput tests were performed using PAIR.  (Good news, I've now validated PAIR works. :-)

The IPC was directed at file path in /tmp, and TCP used 127.0.0.1 ports.

Note that my inproc tries hard to avoid copying, but does still copy due to a mismatch about header vs. body location.  I'll probably fix that in a future update (its an optimization, and also kind of a benchmark special since I don't think inproc gets a lot of performance critical use.  In Go, it would be more natural to use channels for that.

Monday, March 24, 2014

SP (nanomsg) in Pure Go

I'm pleased to announce that this past weekend I released the first version of my implementation of the SP (scalability protocols, sometimes known by their reference implementation, nanomsg) implemented in pure Go. This allows them to be used even on platforms where cgo is not present.  It may be possible to use them in playground (I've not tried yet!)

This is released under an Apache 2.0 license.  (It would be even more liberal BSD or MIT, except I want to offer -- and demand -- patent protection to and from my users.)

I've been super excited about Go lately.  And having spent some time with ØMQ in a previous project, I was keen to try doing some things in the successor nanomsg project.   (nanomsg is a single library message queue and communications library.)

Martin (creator of ØMQ) has written rather extensively about how he wishes he had written it in C instead of C++.  And with nanomsg, that is exactly what he is done.

And C is a great choice for implementing something that is intended to be a foundation for other projects.  But, its not ideal for some circumstances, and the use of async I/O in his C library tends to get in the way of Go's native support for concurrency.

So my pure Go version is available in a form that makes the best use of Go, and tries hard to follow Go idioms.  It doesn't support all the capabilities of Martin's reference implementation -- yet -- but it will be easy to add those capabilities.

Even better, I found it pretty easy to add a new transport layer (TLS) this evening.  Adding the implementation took less than a half hour.  The real work was in writing the test program, and fighting with OpenSSL's obtuse PKI support for my test cases.

Anyway, I encourage folks to take a look at it.  I'm keen for useful & constructive criticism.

Oh, and this work is stuff I've done on my own time over a couple of weekends -- and hence isn't affiliated with, or endorsed by, any of my employers, past or present.

PS: Yes, it should be possible to "adapt" this work to support native ØMQ protocols (ZTP) as well.  If someone wants to do this, please fork this project.  I don't think its a good idea to try to support both suites in the same package -- there are just too many subtle differences.

Thursday, February 6, 2014

The Failed Promise

My dislike for C++ is well-known by those who know me.  As is my lack of fondness for Perl.

I have a new one though.  Java.  Oracle and Apple have conspired to kill it.

(I should say this much -- its been a long time, about a decade, since I developed any Java code.  Its entirely possible that I remember the language with more fondness than it truly warrants.  C++ was once beautiful too -- before ANSI went and mutated it beyond all hope back in 1990 or thereabouts.)

Which is a shame, because Java started with such promise.  Write-once, run-anywhere.  Strongly typed, and a paradigm for OO that was so far superior to C++.  (Multiple inheritance is the bane of semi-competent C++ engineers, who often can't even properly cope with pointers to memory.)

For years, I bemoaned the fact that the single biggest weakness of Java was the fact that it was seen as a way to make more dynamic web pages.  (I remember HotJava -- what a revolutionary thing it was indeed.)  But even stand-alone apps struggled with performance (startup times were hideous for anyone starting a Swing app.)

Still, we suffered because of the write once, run-anywhere promise was just too alluring.  All those performance problems were going to get solved by optimization, and faster systems.  And wasn't it a shame that people associated Java with applets instead of applications?  (Java WebStart tried to drive more of the latter, which should have been a good thing.)

But all the promise that Java seemed to offer is well and truly lost now.  And I don't think it can ever be regained. 

Here's a recent experience I had.

I had reason to go run some Java code to access an IPMI console on a SuperMicro system.  I don't run Windows.   The IPMI console app seems to be a mishmash of Java and native libraries produced by ATEN for SuperMicro.  Supposedly it supported MacOS X and Linux.

I lost several hours (several, as in more than two) trying to get a working IPMI console on my Mac.  I tried Java 7 from Oracle, Java 6 from Apple, I even tried to get a Linux version working in Ubuntu (and after a number of false starts I did actually succeed in the last attempt.)  All just to get access to a simple IPMI console.  Seriously?!?

What are the problems here?
  • Apple doesn't "support" Java officially anymore.
  • Apple disables Java by default on Safari even when it is installed.
  • Apple disables Java webstart almost entirely.  (The only way to open an unsigned Java webstart file -- has anyone ever even seen a signed .jnlp?) is to download it, and open it with a right-click in the finder and explicitly answer "Yes, I know its not from an approved vendor, but open the darn thing anyway.  Even though Java also asks me the same question.  Several times.)
  • Oracle ships Java 7 without 32-bit support on Macs, so only Safari can use it (not Chrome)
  • Oracle Java 7 Update 51 has a new security manager that prevents most unsigned apps from running at all.  (The only way to get them to run at all is to track down the Java preferences pane and reduce the security setting.)
  • Developers like ATEN produce native libraries which means that their "Java" apps are totally unportable.
All of this is because people are terrified of the numerous bugs in Java.  Which is appropriate, since there have been bugs in various JVMs.  Running unverified code without some warning to the end-user is dangerous -- really these should be treated with the same caution due a native application.

But, I should also not have to be a digital contortionist to access an application.  I am using a native IPMI console from an IP address that I entered by hand on a secured network.  I'd expect to have to answer the "Are you sure you want to run this unsigned app?" question (perhaps with an option to remember the setting) once.  But the barriers to actually executing the app are far too high.  (So high, that I still have not had success in getting the SuperMicro IPMI console running on my Mac -- although I did eventually get it to work in a VM running Ubuntu.)

So, shame on Apple and Oracle for doing everything in their power to kill Java.  Shame on ATEN / SuperMicro for requiring Java in the first place, and for polluting it with native code libraries.  And for not getting their code signed.

And shame on the Java ecosystem for allowing this state of affairs to come about.

I'll most likely never write another line of Java, in spite of the fact that I happen to think the language is actually quite elegant for solving OO programming problems.  The challenges in deploying such applications are just far too high.  In the past the problem was that the cost of the sandbox meant that application startups are slow.  Now, even though we have fast CPUs, we have traded 30 second application startup times for situations where it takes hours for even technical people to get an app running.

I guess its probably better on Windows.

But if you want to use that as an excuse, then just write a NATIVE APPLICATION, and stop screwing around with this false promise of "run-anywhere". 

If a Javascript app won't work for you, then just bite the bullet and write a native app.  Heck, with the porting solutions available, it isn't that much work to make native apps portable to a variety of platforms.  (Though such solutions necessarily leave niche systems out in the cold.  Let's face it, the market for native desktop apps on SPARC systems running NetBSD is pretty tiny.)  But covering the 5 mainstream client platforms (Windows, Mac, iOS, Android, Linux) is pretty easy.  (Yes, illumos isn't in that list.  And maybe Linux shouldn't be either, although I think the number of folks running a desktop with Linux is probably several orders of magnitude larger than all other UNIXish platforms -- besides MacOS -- combined.)

And, RIP "write once, run-anywhere".  Congratulations Oracle, Apple.  You killed it.

Wednesday, January 29, 2014

Sorry for the interruption....

Some of you may have noticed that my email, or my blog, or website, was down.

I discontinued my hosting service with BlueHost, and when I did, I mistakenly thought that by keeping my domain registration with them, that I'd still have DNS services.  It was both a foolish mistake, and yet an easy one to make.  (I should have known better...)

Anyway, things are back to normal now, once the interweb's negative caches have timed out...

It does seem unfortunate that BlueHost:


  • Does not include DNS with domain registration, nor do they have a service for it unless you actually host content with them.
  • Did not backup my DNS zone data.  (So even if I paid them to reactivate hosting, I was going to have to re-enter my zone records by hand.)  This is just stupid.

So, I've moved my DNS, and when my registration expires, I'll be moving that to another provider as well.  (Dyn, in case you wondered.)

Saturday, January 18, 2014

Worst Company ... Ever

So I've not blogged in a while, but my recent experience with a major mobile provider was so astonishingly bad that I feel compelled to write about my experiences.  (Executive summary: never do business with T-mobile again!)

I had some friends out from Russia back in November, and they wanted to purchase an iPhone 5s for their son as a birthday gift.  Sadly, we couldn't get unlocked ones from the Apple Store at the time, because they simply lacked inventory.

So we went to the local T-mobile store here on San Marcos Blvd (San Marcos, CA.)  I knew that they were offering phones for full-price (not subsidized), and it seemed like a possible way for them to get a phone.  Tiffany was the customer service agent that "assisted" us.  She looked like she was probably barely out of high school,  but anyway she seemed pleased to help us make a purchase.  I asked her no fewer than three times whether the phone was unlocked, and was assured that, yes, the phone was unlocked and would work fine with Russian GSM operators.  I was told that in order for T-mobile to sell us the phone, we had to purchase a 1-month prepaid plan for $50, and a $10 sim card.  While I was little annoyed at this, I understood that this was their prerogative, and the cost of wanting to buy the device "right now" and not being able to wait 2 weeks for Apple to ship one.  (My friends had less than a week left in California at this time.)  We were buying the phone outright, so it seemed like there ought to be no reason for the device to have a carrier lock on it.

So, of course, the phone was not unlocked.  It didn't work at all when it got to Russia.  Clearly Tiffany didn't have a clue about the product she sold us.  I guess I shouldn't have been too surprised. But we had put down $760 in cash for a device that now didn't work.  Worst of all, I was embarrassed, as was my friend, when his son got what was essentially a lemon for his birthday gift.

I was pretty upset, and went back to the store.  Tiffany was there, and apologized.  She offered to pay up to $25 for me to use a third party service to unlock the phone since "it was her fault".  Of course, the third party service stopped offering this for iPhones, and their cost for iPhone 5's was over $100 when they were offering it.

I tried to get T-mobile to unlock the phone.  I waited in the store for a manager for about 2 hours, but Tiffany finally got upset and called the police to get me to leave the store after I indicated that I preferred to wait on the premises for a manager.   I left the store premises, unhappy, but resolved to deal with this through the corporate office.

I went home, and called T-mobile customer service.  It turns out that T-mobile store is really a 3rd party company, even though they wear T-mobile shirts, and exclusively offer T-mobile products.  (More on that later.)

I spent quite a few hours on the phone with T-mobile.  Somewhere in the process, the customer service people called Tiffany, whereupon she immediately reversed her story, claiming she had informed me that the phone was locked.  This was just the first of a number of false and misleading statements that were made to me by T-mobile or its affiliates.   I spent several hours on the phone that day.  At the end of the call, I was told that T-mobile would unlock my phone only after at least 40 days had passed from the date of activation.   The person at corporate assured me that if I would just wait the 40 days, then I'd be able to get the phone unlocked.  Since the phone was already in Russia, I figured it would take at least that long to send it back and send a replacement from another carrier.

Oh one other thing.  During that first call, I discovered that Tiffany had actually taken the $50 prepaid plan we purchased, and kept it for herself or her friends.  We didn't figure that out until T-mobile customer service told me that I didn't have a plan at all.  Nice little scam.  While dishonest, I would not have begrudged the action had the phone actually worked.  Of course it didn't.

(During all this, on several different occasions, T-mobile customer service personnel tried to refer me to the same 3rd party unlocking service.  Because hey, if T-mobile can't get its own phones unlocked, maybe their customers should pay some shady third party service to get it done for them.  But it turns out that whatever backdoor deal that service had previously doesn't work anymore, because they've stopped doing it for Apple phones.)

So we waited 40 days.

And then we called to have the phone unlocked.

And T-mobile refused again.  This time I was told that I needed to have $50.01 or more, not just $50 on the plan.  After a few more hours on the phone and escalation up through a few levels of their management chain, they credited my account for $0.20, and then resubmitted the unlock request.  I was guaranteed that the phone would be unlocked.

Two days later, T-mobile denied the unlock request.

At this point, I was informed that the phone had to have T-mobile activity on it within 7 days of the unlock request.  This was simply not going to happen.  The phone was in Russia!  I complained, and I spent quite a few more hours on the phone with T-mobile.  It seems like the folks who control the unlocking process of their phones have nothing to do with the people who answer their customer service lines, nor those people's bosses, nor those people's bosses.  Astonishing!

At this point I had spent over a dozen hours trying to get T-mobile to unlock the darned thing.  T-mobile had got their money from me already, and had done pretty much everything possible to upset me.  The amount of money T-mobile spent dealing on the phone with me, trying to enforce a stupid policy, which wouldn't have been necessary if they had just admitted their mistake and fixed it (which would have cost them nothing whatsoever) is astonishing.  Talk about penny-wise and pound foolish.

At that time, T-mobile told me that I would be able to return the other phone, provided I got it back from Russia.  They agreed to this even though the usual 14 days "buyer's remorse" window had passed.

So at this point, I went and purchased a phone from Verizon (and paid a $50 premium because I was at BestBuy instead of the Apple store and the Verizon store wouldn't sell the phone unless it was part of a plan), and I sent it to Russia with my step-son, who was going their for Christmas.  That phone did work, and my step-son exchanged the T-mobile phone for it, bringing the T-mobile phone back.

The next day, I went to return the phone at the store where I bought it.  Sadly, Tiffany was there.  So was her manager, Erica.

After spending about an hour at the store trying to return it, they agreed to take it back, minus a $50 restocking fee.  And as I paid cash, I had to provide my checking account information so they could do a deposit for me, which would take a few days.  I got a paper showing that they were sending me a refund, but nothing indicating the account number.  (Turns out it took a few calls from Erica -- who claimed to be the store manager and probably was about 10 minutes older than Tiffany -- the next day, since she apparently had no clue what she was doing and needed to get two separate pieces of information that she failed to collect while I was in the story.)

I did spend a bunch of time with customer service on the phone -- a few more hours I guess, trying to get that last $50.  At this point it was just a matter of principle.  I resented the whole thing, and I wanted as much of my money back as possible.  The customer service person tried to make it right, but because the store (Hit Mobile is the company apparently) is separate from T-mobile, the store's decision was final.  The store manager (Erica) refused to refund the stocking fee, and there was nothing T-mobile could do about it.  To their credit, they did offer me a $50 service credit if I was inclined to keep an account with them.  Needless to say, I have no interest in ever being a T-mobile customer, so I thanked them and declined... indicating that I'd prefer to have funds back in cash.  (Nobody ever mentioned the restocking fee; on the contrary I was told I'd receive the full purchase price less the $60 service plan and SIM card.  Again, story and reality don't match at T-mobile.)

I did eventually get $650 credited back to my account.

So, I was out $110, plus the extra $50, plus the 13+ hours of my time, plus the embarrassment and long turnaround time to get the replacement out to Russia.  All because T-mobile wouldn't fix a mistake they made, even though numerous people at that company recognized that it was clearly the Right Thing To Do.

Turns out that Verizon iPhones are all unlocked.  And I'm already a Verizon customer.  I was thinking about T-mobile's plans -- some of them are attractive on paper, and potentially could have saved me money relative the rather premium prices I pay for Verizon.  Needless to say, I will not be changing my provider any time soon.  In spite of the high prices, I've always been dealt with honestly and fairly, and I've been happy with the service I've gotten at Verizon.

If for some reason, you decide to get a T-mobile phone, please, please avoid the store located on San Marcos Blvd.  In fact, I urge you to verify that the store is owned by T-mobile corporation.  Its my belief that I might have had more satisfactory results had I been dealing with just a single entity.  That said, numerous people at T-mobile corporate lied to me and made false promises.

I feel so strongly about this, that I'm happy to have spent the hour or so writing up my terrible experiences with them.  I hope that I save someone else these painful experiences.  I won't be unhappy if it also costs T-mobile many potential customers.

For the record, I also think it should be illegal to sell a carrier locked device without clearly indicating this is so as part of the transaction.  The practice of carrier locking devices makes sense when the cost of the device is being subsidized by the carrier as part of a service plan.  But it should be possible to purchase the device outright and remove any such lock.  T-mobile's practices here are a disservice to their customers, and their partners.  The handset manufacturers should apply pressure to them to get them to change their policy here.

Thursday, October 17, 2013

illumos corporate entity... non-starter?

I want to give folks a status update on the illumos corporate entity.

In a nutshell, the corporate entity seems to be failing to have traction.  In particular, the various corporate contributors  and downstreams for illumos have declined to step up to ensure that an illumos corporate entity has sufficient backing to make it successful.

While at first blush, this seems somewhat unfortunate, I think this is not nearly quite as bad a thing as it might first seem.  In particular, the failure of a corporate entity does not correlate to the health of the ecosystem -- indeed many successful open source projects operate without an umbrella organization or entity. 

Instead, we see corporate contributors and downstream distributions focusing on developing the communities behind their distributions such as SmartOS and OmniOS.  Those downstreams play an active role in improving illumos for the benefit of all, and its my sincere hope and belief that they will continue to evangelize illumos, and contribute to the common core.

Furthermore, incorporation in the state of California requires paying about $800 of taxes per year.  This is true even for organizations without any revenue.  This is money that would have to be taken from sponsors that would serve only to enrich our state government with no direct benefit to the illumos community.  (Non-profit status is a way around that, but its exceedingly difficult to obtain, and a number of open source organizations are finding themselves under very close scrutiny from the state and federal authorities.    Indeed, the X.org community themselves lost their 501(c)3 status not that long ago.)

So, without corporate sponsors to justify the tax and administrative load, I've decided that the illumos corporate entity should expire.  I do want to thank Deirdré Straughan for the non-trivial amount of effort she put into this, as well as the Software Freedom Law Center for the pro-bono work they did for us while we were trying to navigate the waters of becoming a true non-profit open source organization.

And, if any corporate sponsors are out there watching this, and interested in resurrecting the illumos organization, then I'm happy to entertain the possibilities.  I think there is value in an actual organization with a legal status to receive and distribute funds, and who can hold certain items of intellectual property, including the rights to the illumos trademark itself.   But there has to be enough of a sponsorship behind this to make it worthwhile.

In the meantime, I will continue to hold the illumos mark as a trademark that I keep in trust for the community.  The code is something that is already subject to distributed ownership, and so is completely unimpacted by any of this.

Thanks.

  - Garrett

Sunday, October 13, 2013

Moving on (Adieu to Studio?)

I think illumos is at a key juncture, and the issue relates to our toolchain.

We have gcc 4.4.4 working thanks in large part to the efforts of folks like Rich Lowe.

We have historically relied on Sun Studio (now Solaris Studio) as our "base" or "default" compiler for C and C++ programs, and also to supply lint coverage.

I've been thinking for a long time that its past time we (the illumos community) moved on from this.  Not only are the Studio 12 compilers not available in source form, they are now not available in suitable form for building illumos as binaries either.  (Apparently it is possible under some terms to get Solaris Studio 12.3, but who knows if those compilers are suitable for building illumos.  In the past we have always needed specifically patched compilers for Solaris.)

The situation where a general developer cannot obtain the necessary tool chain for working with illumos is untenable.

Today we require "lint" as part of an official build.  But not just "any" lint, but lint generated by a specific patched older version of Sun Studio 12.  A version that is no longer available to the community at large.

(And in full disclosure, the problem for me is brought to a head by the fact that the machine I had my local installation of these tools on has been retired, and I find I don't have another backup of them readily at hand to install on my new machine.)

So, I'm placing this as a call to the illumos community at large, and especially to our RTI advocates.  Its time.  Really. 

Really.

Studio has to go.

I don't care if we leave the infrastructure in place for people that want to continue to use Studio, but we need to switch to gcc.  Rich's 4.4.4 version gets the job done for now.  It would be great if we could support newer versions, but I understand that this requires some non-trivial amount of effort (some gcc patches that need to be upstreamed, as I understand it.)

We cannot be tied to a closed tool chain; especially a tool chain that doesn't at least include binary redistribution privileges. 

For the record, I've posted the same content to the illumos developer and advocates lists.

Thursday, August 29, 2013

Responding to Trolls

We have an... ahem... toxic personality in our midst in illumos.  I'm going to respond here to his most recent post, to avoid wasting the time on the illumos developers mailing list.  If you don't care about non-sense like this (although it may have entertainment value), please just ignore this post and move on.

The only reason I'm replying even here, is to set the record straight, and establish firmly the history -- at least my side of it -- for posterity.

In the early days of illumos, I invited those I could think of from the Solaris and OpenSolaris communities to participate.  I even reached out to some parties, like Jörg Schilling, who had a bad reputation as difficult people to work with.

I had hoped to repair those broken relationships since there was no longer a corporate controlling entity for illumos.  (Nexenta sponsored my efforts, and paid some of the early operating costs for illumos, but it has always been the case that illumos operates by and for the community -- and not at the behest of any corporate master.  As I've since left Nexenta to start my own company, and illumos has carried on, I think there is fairly ample evidence in support of this fact.)

I reached out, and tried to help Jörg with some integrations; but those integrations did not go well.  Jörg did not like our policy requiring a build or test of changes on the primary distribution (OpenIndiana at the time) used for the vast majority of developers and users, and so several of his changes did not get integrated.  (It turns out this was a good thing, because at least one of those changes, a "fix" for a keyboard problem, was mis-designed and would have broken international keyboards for the majority of illumos' user community.)

We also required code review, as we do now.

Jörg has long had a desire to have "star" integrated as the main replacement for all archivers.  That didn't take place, because he couldn't find reviewers.  And after several negative interactions with him, I withdrew my offer to sponsor his work personally, as I found the process of trying to do so unbelievably painful.  (There was an extremely long telephone call with Joerg that sticks in my memory.)  I suggested he find someone else willing to code review and sponsor his work for integration into illumos.  To date, that has not happened.

Somewhere during that time, Jörg indicated that he would not offer any other code up for integration until star was integrated.  Basically, he took his toys and went home.  Except he just took his toys, but forgot to go home, as he's still trying to create noise on our developer mailing lists.

There was another negative experience we had, mirroring today's thread, where Joerg's response to a tool I wrote (an open replacement for "od") was non-constructive criticism.  (In fact, his response was "it's broken, why don't you use this thing I haven't finished writing yet".  (He modified his "hexdump" utility to behave like od, apparently in response to my effort instead of simply offering helpful review feedback.)

This is typical of Jörg -- the only "right" answer as far as he is concerned is to accept his code, even if it isn't written yet or is buggy!

Jörg is what is known as a poisonous personality.  At least in every context that I've seen him operate. Which is why usually I try to just ignore him, and hope he'll go away; yet like certain garden pests he just keeps coming back.  He's even been kicked out of another prominent open source community.

He's also said made some incorrect and defamatory statements about both Nexenta and illumos at a prominent academic conference in Germany.  (I can't find the link right now, I'll update if I find it.)

So, where in his post today he asks Gordon if he's willing to change the rules for integration, I have this in reply:

Are you willing to change this? 
Gordon isn't empowered to.  Neither, for that matter, am I.  The rules that have been established for collaboration here are by common consent of the major contributors, and no single individual can change them.  Notably, those were the rules in force in Summer 2010, and you were the one that wanted special treatment.   Those rules have served this community well for the past several years, and there has been no movement to change them.

I am interested in setting up a phalanx of OpenSolaris activities because this
is the only way we have a chance to continue with OpenSolaris development but
this is hard as long as Illumos is not willing to play nicely.

OpenSolaris is dead.  The entire illumos community has moved on.  You have not, but then I have long since stopped considering you a part of the illumos community.
Especially given your repeated attacks against both illumos and numerous of its contributors -- even fairly early on you made a number of defamatory and incorrect statements about the nature of both illumos and its contributors in technical presentation before the German academic community.  This was not the behavior of someone who is a constructive contributor.
To be completely honest, I think I'm not alone in wishing that you'd come to the same conclusion: that is that you are not part of this community.

I recently tried to build a bridge by propising that Illumos should implement
the proposals from Summer 2010 to regain credibility. Unfortuntely there was no
reaction.
 
That should speak volumes all by itself. 
As someone else in another open source community said, Jorg Schilling is damage; the community should route around him.

Tuesday, August 27, 2013

An important milestone

Yesterday, with very little fanfare, illumos passed an important milestone.

This milestone was the integration of 195 Need replacement for nfs/lockd+klm

This is work that I originally tasked Gordon Ross and Dan Kruchinin to work while we were colleagues at Nexenta.  Gordon started the work, picking up bits from FreeBSD as a starting point and gluing it to the illumos entry points.  Dan continued with it, and fleshed out a lot of the skeleton that Gordon had started.  It was subsequently picked up by the engineers at Delphix, and -- after some important bug fixing work was completed -- was integrated into their product quite some time ago -- reportedly its been stable since December in their product.

The reason this is such an important milestone, is two fold:

First, this represents a substantial collaborative effort, involving contributions from parties across several organizational boundaries.  The level of collaboration achieved here is a win for the greater good of the community.

Second, this effort represents the replacement of the last of the truly closed source code in our kernel proper.  There are still some closed source drivers (although I will point that we're now seeing more traction with vendors to supply open drivers, with recent open source code contributions coming from places like SolarFlare, LSI, and HP.  (HP's contribution -- through illumos partner Joyent -- means that at least one important closed source driver will soon be open -- cpqary3.)

I want to personally thank everyone who contributed to make the integration of the new open source NFS lock manager possible. You've done a big service to the community, and we are grateful!

Thursday, January 17, 2013

ZFS Compression Enhancements in illumos

The ZFS code in illumos has gained another differentiator thanks to the hard work of Saso, who integrated LZ4 improved compression into our code base.

This is a nice achievement, and enhanced compression should be quite useful for most applications of ZFS.  If LZJB was always recommended to be turned on (it was by our team at least!), then LZ4 is even more recommended, unless all of your data is totally incompressible.  (Most data is highly compressible.)

You should check out his wiki post above, which contains a lot more details about the work, including some benchmark data.

Saturday, October 27, 2012

hackathon project: pfiles - postmortem analysis

During the first days of October 2012, we had an illumos day, and a one day illumos hackathon that was very well attended by some of the best and brightest in our community (and I daresay, in any open source community.)

The purpose of the hackathon, besides just doing some cool projects, is to give folks a chance to interact with other domain experts, and code, particularly in areas outside their particular area of expertise.

This year a number of interesting projects were worked on, and I think some of these are at the point of starting to bear fruit.  The project I undertook was the addition of post-mortem analysis support for pfiles(1).  That is, making it possible to use pfiles against a core(4) file.  Adam Leventhal suggested the project, and provided the initial suggestion on the method to use for doing the work.

This is a particularly interesting project, because the information that pfiles reports is located in kernel state (in the uarea) and has not traditionally been available in a core file.

To complete this project, I had to work in a few areas that were unfamiliar to me.  Especially I had never looked at the innards of ELF files, nor at the process by which a core file is generated.  To add the necessary information, I had to create a new type of ELF note, and then add code to generate it to libproc (used by gcore) and elfexec (for kernel driven core dumps).  I also had to add support to parse this to libproc (for pfiles), and elfdump.  The project turned out to be bigger than anticipated, consisting of almost a thousand lines of code.  So it took me a little more than a day to complete it.

A webrev with the associated changes is presented for your enjoyment.  I'd appreciate any review feedback, and especially any further testing.

Here's a sample run:




garrett@openindiana{250}> cat < /dev/zero &
[2] 144241
garrett@openindiana{251}> pfiles 144241
144241: cat
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:526,0 ino:35127324 uid:0 gid:3 rdev:67,12
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/mm@0:zero
      offset:72712192
   1: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:72704245
   2: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:72704245
garrett@openindiana{252}> gcore 144241
gcore: core.144241 dumped
garrett@openindiana{253}> pfiles core.144241 
core 'core.144241' of 144241: cat
   0: S_IFCHR mode:0666 dev:526,0 ino:35127324 uid:0 gid:3 rdev:67,12
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/mm@0:zero
      offset:195125248
   1: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:161882375
   2: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:161882375
garrett@openindiana{254}> kill -11 144241
garrett@openindiana{255}> pfiles core
core 'core' of 144241: cat
   0: S_IFCHR mode:0666 dev:526,0 ino:35127324 uid:0 gid:3 rdev:67,12
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/mm@0:zero
      offset:419414016
   1: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:296960430
   2: S_IFCHR mode:0620 dev:527,0 ino:893072809 uid:101 gid:7 rdev:27,2
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/2
      offset:296960430
[2]  - Segmentation fault            cat < /dev/zero (core dumped)

garrett@openindiana{256}> elfdump core
...


  entry [10]
    namesz: 0x5
    descsz: 0x440
    type:   [ NT_FDINFO ]
    name:
        CORE\0
    desc: (prfdinfo_t)
        pr_fd:        0
        pr_mode:      [ S_IFCHR S_IRUSR S_IWUSR S_IRGRP S_IWGRP S_IROTH S_IWOTH ]
        pr_uid:       0                  pr_gid:      3
        pr_major:     526                pr_minor:    0
        pr_rmajor:    67                 pr_rminor:   12
        pr_ino:       35127324
        pr_size:      0                  pr_offset:   419414016
        pr_fileflags: [ O_RDONLY O_LARGEFILE ]
        pr_fdflags:   0
        pr_path:      /devices/pseudo/mm@0:zero

  entry [11]
    namesz: 0x5
    descsz: 0x440
    type:   [ NT_FDINFO ]
    name:
        CORE\0
    desc: (prfdinfo_t)
        pr_fd:        1
        pr_mode:      [ S_IFCHR S_IRUSR S_IWUSR S_IWGRP ]
        pr_uid:       101                pr_gid:      7
        pr_major:     527                pr_minor:    0
        pr_rmajor:    27                 pr_rminor:   2
        pr_ino:       893072809
        pr_size:      0                  pr_offset:   296960430
        pr_fileflags: [ O_RDONLY O_NOCTTY O_LARGEFILE ]
        pr_fdflags:   0
        pr_path:      /dev/pts/2

  entry [12]
    namesz: 0x5
    descsz: 0x440
    type:   [ NT_FDINFO ]
    name:
        CORE\0
    desc: (prfdinfo_t)
        pr_fd:        2
        pr_mode:      [ S_IFCHR S_IRUSR S_IWUSR S_IWGRP ]
        pr_uid:       101                pr_gid:      7
        pr_major:     527                pr_minor:    0
        pr_rmajor:    27                 pr_rminor:   2
        pr_ino:       893072809
        pr_size:      0                  pr_offset:   296960430
        pr_fileflags: [ O_RDONLY O_NOCTTY O_LARGEFILE ]
        pr_fdflags:   0
        pr_path:      /dev/pts/2
...


Monday, October 22, 2012

Latest Apple EarPods - Success!

A couple of months ago I purchased the rather expensive pair of Apple in-ear earphones, to replace an older pair of standard earbuds I'd given to someone else.  They never fit my ear correctly, and I hated them.

Recently I've also taken up getting more physically fit -- so going to the gym more often, including spending a lot of time running on the treadmill.

But my earphones kept falling out.  And my iPhone 4 is kind of heavy and inconvenient.

So I picked up the iPod nano (7th gen) today at Best Buy.  And it comes with a new style of earbud.

I've been wearing them now for about 7 hours, including a fairly intense 40 minutes of running on the treadmill.  They haven't budged once.  And while I'm no audiophile, the audio quality seems to be better than any of the other earbuds I've used.

And the nano -- really light, really nice.  It doesn't do everything, and I've not tried out all the functions yet, but for playing music, it works great.  And, unlike my iPhone, it doesn't have a tendency to change songs or activate other functions while it is in my pocket.   So, +1 on the iPod nano too.

Just be aware that the earpods that come with the nano do not include the remote (and I think they lack a mic as well).  It looks like the $29 separate ones include the remote, and a mic.  I'll be purchasing a pair for other use... I wish they'd package them the same way the in-ear headphones are packaged though.  (Much easier to keep them stored tangle-free.)

Tuesday, October 16, 2012

GNU grep - A Cautionary Tale About GPLv3

My company, DEY Storage Systems, is in the process of creating a new product around the illumos operating system.  As you might imagine, this product includes a variety of open and proprietary source code.  The product itself is not delivered as a separate executable, but as a complete product.  We don't permit our customers to crack it open, both from the sense of protecting our IP, but also to protect our support and release engineering organizations -- our software releases consist only of a single file and we don't supply tools or source for other parties to modify that file.

One of the pieces that we wanted to integrate into the tree is an excellent little piece of software called Zookeeper, produced by the Apache organization.  Like illumos, Zookeeper has a nice non-viral copyleft license, which makes it nice for integration into our product.

However, I discovered that as part of our integration, one of my engineers had decided to integrate GNU grep.  Why? Because the startup script for Zookeeper apparently makes use of some GNU grep extensions that are not illumos' grep.  (The need for us to enhance illumos' grep utility, and the need to teach folks not to rely on non-portable GNU extensions, are both topics for another time.)

Fortunately, I caught this in time.  Because that one little "worm" - used simply to support a startup script, could have created a situation where we would have been required to open source our entire product.

Now if you're Richard Stallman -- this is your goal -- all source code is "liberated".

But if you're a business trying to create value, this is actively harmful.  Its almost impossible to build a product around GPLv3 code unless the only way you create value is through selling support.  (There are a variety of business reasons this is a bad model... with open code, 3rd parties can start "selling support" for your product, possibly giving your product a bad name, and generally leaching off your engineering effort.   In the end, without proprietary code, there is vastly reduced economic incentive to innovate -- you can't use innovation in software to gain a competitive advantage when all software is open.  I would argue that even the innovation that occurs in Linux exists largely due to economic pressures arising from attempts to compete against proprietary systems. )

So, while the correction here is not big for us -- just a small change to a startup script for now -- the hazard of these viral licenses cannot be understated.  If you make your living, as I do, writing software, then you should be very wary of the lure of code carrying the GPLv3 -- its a potential license bomb that may have nasty consequences for you later.

And if you're a producer of open source software -- as I am -- be wary of your dependencies as well.  While your code may be licensed under reasonably generous terms that support commercial uses, if your dependencies include things like GPLv3 GNU grep, you may find that your product has less wide acceptance than you intended.  Inclusion (by dependency or otherwise) of a GPLv3 product in your own product means that you are willing to abdicate any decisions about the licensing of your own product.

I'll get off my soapbox now...

Thursday, September 20, 2012

Infographic - Programming

I was forwarded the following infographic from OnlineCollege.org.  While I'm not normally one to just regurgitate someone else's posting (especially advertising), I think this one is worth looking at -- especially if you're at a point where you're contemplating future career or learning plans.



Programming Infographic

illumos and ZFS day!

I'm pleased to announce that DEY Storage Systems, Inc., along with Joyent and other community members, is sponsoring the illumos and ZFS Day taking place in October 1st and 2nd at the Children's Creativity Museum in San Francisco.  My partners and I will be in attendance, along with a few other of my colleagues, and we'll be speaking on related topics.

Additionally, Joyent is hosting the 2nd illumos hackathon at their facilities on October 3, and we will definitely be participating there.  We had a blast doing this last year (some excellent work came out of it), and we're looking forward to doing it again.

If you can make, I hope you'll register.  The event itself is free, and while there will be online recordings of the material, its more fun to get together in person -- especially with a beer or three.  (Did I mention there will be a Solaris Family Reunion party on Monday night?

In many ways, I view this event as a belated 2nd birthday party for illumos, so I hope you'll join me in celebration.

Sunday, September 9, 2012

The Case for a new Reference Distro for illumos

With the recent announcement and discussion around OpenIndiana, I think its time to bring to the light of day an idea I've been toying with for a while now -- and have discussed with a few others in the community.

illumos needs a reference distribution.

Now just hold on there... before you get all wound up, let's discuss what I mean by a reference distribution.


  • A reference distribution should be a base for additional development, and testing. 
  • It should support automated installation for mass deployment in test lab/QA/build farm scenarios.
  • It must support "self-hosting" -- i.e. illumos-gate itself should be buildable on this distribution -- with as few additional steps as possible.
  • It must be valid as a QA platform -- for example, it needs to support the packaging tools we use in order to validate the packaging metadata (manifests).  (Yes, this means it needs to support - and be packaged with - IPS -- in spite of my loathing of IPS itself.)
  • It should be a reasonable (but not the only reasonable possible) base to use for other distributions looking to build on top of it.
  • Ideally it should be pretty darn minimal.
  • It may have graphics support, e.g. an X server, but only in so far as having that would support validation of technologies in illumos-gate.  (Sorry, no big desktops, Gnome, web browsers, or any of that crap.)
  • Community driven - it contains the things that the entire community agrees upon as our common core.
  • It must require only a modicum of effort around release engineering - none of these multi-week release cycles.
  • It should be released biweekly with a numbered build, and probably also have nightlies.  These should be automated.
  • Multi-platform - i.e. it needs to support both SPARC and x86.


As a common reference, it must use illumos-gate verbatim, i.e. no local changes.

Here's what it is not:

  • A general purpose desktop.
  • A general purpose server.
  • A hypervisor.
  • An appliance.
  • Subject to the whims of any corporate entity.
  • An experimental version (it isn't Fedora!)
Why do we need this?  OpenIndiana has been filling the role - but OpenIndiana is insufficient for several reasons:

  • OpenIndiana is too big and unwieldy -- both to build, and to install.
  • OpenIndiana maintains their own fork of illumos-gate.
  • OpenIndiana needs the freedom to follow their own course - without being too strictly tied to illumos-gate.
  • OpenIndiana lacks any kind of formal QA.  (The recent debacles around 151 are good examples of this.)
  • OpenIndiana is not built for SPARC.
(Some of the above items could be addressed with OI, but I think addressing all of them is nigh impossible without totally changing the role of OI.)

So, here's what I propose we do instead.
  • I propose that we start with OpenIndiana or OmniOS, but using core illumos-gate (no fork), and we remove things that are not relevant to the reference.  E.g. Gnome, etc.
  • We will need lots of assistance with release engineering - I want a formal RE plan in place for this thing, with regular scheduled builds and releases following the "train" model rather than the "station wagon" model.
  • We will need to set up some central build and test servers.  (I hope to get some funding for this through the Foundation).
  • We will need to have formalized QA.  (Imagine a Jenkins instance that ran a build and regression test group every night.
  • The bug database should add fields to reference formal build numbers.  (Introduced-In, Fixed-In, Reported-In, etc.)
  • We do not need "an installer".  So the image can just be a basic ISO or tarball, probably.  Think minimalist.  I don't want any customization in the installer, excepting possibly the identification of the initial disk to install to.

Ultimately, if we're very successful, then this will help the entire illumos community.  This reference may ultimately be useful to become a base for OpenIndiana or OmniOS to build upon.  Put another way, I want to make OI's job easier, but I don't intend that this should replace OI.

So, what do I want from the community?

I want to hear from people who think this is a terrible idea.  A descriptive criticism of concerns is far more useful than just another +1.  (If you want to +1 it, do it using Google or FB to like it -- don't add a +1 comment.)  Post your concerns directly to this thread, so that we can keep them for posterity.

I want to hear from people who want - and are able - to actively contribute to this.  Most of this effort will be volunteer based initially, but I think we'll eventually need to staff some work from the Foundation.  Corporate sponsors may help here as well, either by loaning of resources, or gifting of earmarked funds.

Btw, for folks that want to debate names -- let me kill that now.  I want to name this "IRD" -- illumos Reference Distribution.   Its about as unsexy as possible - and that is the point -- this thing should be viewed as something the community uses internally, and not something that is ever given marketing attention.  (Those spot lights belong to illumos itself, and to the other end-user oriented distributions.)

To be clear, I'm not 100% set that this is all the way to go -- but I'm heavily leaning in that direction.  While I can't force the rest of the community to follow along, I can allocate resources to it, and I want to make sure that we can address any major concerns before I start investing resources here.

Thanks.

Wednesday, August 29, 2012

Response to Alasdair's Resignation


[ Recently, Alasdair Lumsden announced his retirement as leader of the OpenIndiana community.  This is a copy of my response to that. ]

Dear Alasdair,

Here at the illumos Foundation, we are very sad to hear of your resignation as lead of the OpenIndiana distribution. You have demonstrated great leadership and have been a wonderful friend of the community. We understand that such projects can be very time consuming, and we wish you all the best in your other endeavors. 

As a community, many of us share some of your frustrations. At times, progress does appear to be slow. At other times, there may appear to be a lack of interest from many of the former Sun employees. However, although you may feel that there is a lack of support or interest, many of us active in the community feel differently. While OpenIndiana served to carry on the banner of the OpenSolaris distribution, it was obviously a dead end because the desktop wars are over, and even the Linux community could not win. Meanwhile, the flourishing embedded and special purpose markets for illumos technology continue to grow. 

For example, use of illumos technology is growing in the areas of internet and cloud infrastructure, database and storage appliances, and embedded products. Many successful companies are driving the technology forward and the future has never looked brighter for open source developers looking to contribute to the community and find interesting jobs.

Although you may think we don’t see that the expanding free and open source software marketplace is not our total playground, what we do see is a talented, dedicated group of volunteers committed to seeing illumos become an important part of specialized, niche technology solutions, especially for solving critical problems facing the Internet. 

Honestly, we do not expect the illumos core itself to be economically viable. We do expect--and see--distributions built upon the illumos core becoming commercially successful and growing. This model is similar to the successful Linux model, where the distributions can achieve different results based on the needs of the constituents or the community; some commercial, some less so. 

The model of OpenSolaris is broken. The model of OpenIndiana following OpenSolaris is broken. The illumos model is following the successful Linux model. This is exemplified by distributions such as the commercially supported, general purpose OmniOS, Joyent’s SmartOS, Delphix OS, and others. 

In the future, we expect illumos to become a bigger part of the technological community, as many companies now emerging from stealth mode are incorporating our core platform functions. As our community grows, and word continues to spread of our stability, dependability and scalability, we hope to take illumos to even more specialized markets. 

Among the illumos community of volunteers are some of the smartest, most talented, most dedicated and hard-working people I’ve ever had the pleasure of being associated with. Their years of experience, combined with their insight into the serious problems facing the Internet, assure a rock-solid future for illumos. As a community, we have a shared set of values, and while sometimes we’ve had our disagreements, it’s always been those shared values that have come through and helped the community - and the technology we are building - grow and improve. We will continue to build an enterprise-grade, highly dependable operating system. 

Again, we are saddened by your leaving. On a personal note, I have enjoyed the collaboration that we shared in the formative days of illumos and OpenIndiana, and I hope we will be able to do so in the future.  We thank you for all your efforts on behalf of OpenIndiana and illumos, and know that your efforts have not been in vain; we will take all that we have learned under your leadership and build on it. As a natural evolution of the community, distributions come and go; we take the lessons and move on. Our community has never been one to back down from a challenge; we will handle this in stride and continue to move forward with our mission and our future goals. 

Sincerely,

Garrett D’Amore
illumos Founder

Wednesday, June 13, 2012

Women in Tech Adverstising - a rebuttal

So, there has been some that have argued vehemently against the advertising policies of GoDaddy, and its ilk.  (See https://twitter.com/bdha/status/211210555577466881 for example of the complaints.)

I have a different point of view.  What about sites that always picture a beautiful woman on a headset for their representation of tech support?  Should we also boycott them?  Would you prefer a site with an ugly person answering the phone?  After all, it's a phone or text chat, so it shouldn't matter at all what the person looks like, right?

Conversely, what about all the sites that use men in their advertising?

My hosting provider, bluehost.com, uses both.  Take a look.  Does the pretty blonde in glasses have anything to do with domain names or web hosting?  Of course not.  The guy wearing the headset for the tech support line isn't exactly hideous either, if you swing that way.  Are you likely to talk to this guy if you call?  Probably not -- he's probably just another model.

GoDaddy uses Danica Patrick as their spokesperson.  She's admittedly beautiful, a professional athlete (she drives race cars), and probably makes a fair bit of money endorsing GoDaddy.  Some of the ads she has been in have admittedly been a little tongue in cheek in their appeal to a mostly male clientele, but I think it would be a stretch to say that GoDaddy is using sex to sell domain services.

It would be an even further stretch to say Danica is being exploited by GoDaddy.

(Note that there may be other reasons to avoid doing business with GoDaddy or its CEO, but I am only referring to the use of Danica Patrick, a female spokesperson, and women in advertising in general.)

Would the people who claim that this is somehow evil objectification of women complain as loudly if the spokesperson were a good looking baseball star?  Is this objectification of men, or of male sports stars?  What if the ad somehow appeals more strongly to women, or to gay men?

What if the site featured an animal spokesperson prominently?  Say an orangutan?  (Perhaps with computer animation. :-)  Would this be considered animal exploitation?

I submit to you, dear reader, that each one of these cases are basically the same.  We use humans, animals, or whatever, to provide a human element and appeal in a product being sold.  Sure, it would be nice if the site instead featured some kind of graphic that demonstrated its technical superiority, but come on -- we're talking about a domain registrar, which is about as boring as tech gets.  I can hardly blame GoDaddy for trying to liven up their site and generate some enthusiasm.

In fact, even in the more extreme cases, such as the use of bikini-clad booth babes, I submit that folks who think objectification like this is somehow exploitative are misunderstanding.  These women are very rarely without many choices and options.  Usually by the time we see them, its when they are in a job that they have worked hard to get -- maintaining a perfect figure can be a full-time job, after all.

If folks like author of the aforementioned tweet had their way, they would deny these women the opportunity to earn a living showing off their own accomplishments.  Isn't that even more exploitative?  In fact, in the extreme case, you could consider the more repressive societies of the middle east?  Is this the social ideal to which we should aspire?  Where the only part of woman we see in advertising is a burqa?

(Note that in the case of the site using bikini-clad booth-babes, my gut response is to question what the company is not showing -- their product.  So while the babes may catch my eye, they have yet to draw me to a booth I would not otherwise have visited for the products featured.  Actually, at one former employer a female colleague -- also quite beautiful, but quite knowledgeable -- is aware that she is often mistaken as a booth babe, and regularly uses this to her advantage; customers get surprised somehow when the beautiful woman knows more about the technology than most other men do.)

As for me, I welcome the diversity of our society, and the appreciation of the human form.  If some people are able to use this to humanize their marketing approach, good for them!  Let's face it, I'd rather look at beautiful faces online than ugly ones.

Far better, IMO, for those who would fight against this spend their energies worrying about the women who are truly exploited or repressed -- in the fundamentalist muslim societies of the middle east, in the brothels, and human slave trade.  Don't feel sorry for the woman who is able to make a nice living for herself displaying the body she has worked hard to get and to maintain, and that she is proud of.

And yes, for the record, I usually like Carl's Jr. commercials too.

Monday, June 4, 2012

Ivy Bridge Motherboards -- Don't *Really* Exist!

So, if you're like me, and you've been shopping for a new system lately because you're last one is dying or dead, you might see some fancy looking Intel E3-1200v2 Xeon systems.  (Why Xeon?  Because ECC memory is a good thing.  I attribute some of my difficulty with my last system to using non-ECC memory.  Because its not ECC, I don't really know whether the problem was in RAM or CPU, but I digress...)

Great, so you pick out a E3-1240v2 Ivy Bridge CPU, and then you look for a System Board.

I found the Supermicro X9SCA-F-O.  Looks like a sweet board.  The spec claims support for the E3-1200v2 cpus, and 1600MHz RAM.  It has a separate IPMI LAN support, as well.  Sweet!  (Look closer though....)

So, package arrives from NewEgg, and you're ready to go.

Assemble the kit.... if you're a software guy like me this might take a couple of hours.

Power it on.

Beep!  Beep!  Beep!  Beep!

WTF does that mean!?  Google "4 beeps supermicro".

Oh, "Unsupported processor."  Further reading....

The system needs a BIOS update (to v2.0) in this case.  Seems reasonable.

Download the instructions....

"Step 1. First, boot the motherboard with an E3-1200 series (not E3-1200 V2)
processor..."

WTF!?!  A new E3-1200 (that's Sandy Bridge mind you, still fairly recent stuff) processor is going to cost me like 200 bucks, minimum.  And I can't return it after I buy it except to exchange it with a different CPU.  But I already have the E3-1240v2.

Anyone want to buy a very lightly used E3-1220 from me?  It looks like I'll be purchasing one tomorrow. Ultimately, I can't afford not to burn the $200, because I simply don't have the time to keep screwing around this stuff.  I've burned the better part of today trying to wrangle hardware.  (Hmmm.... where's a Systems Engineer when you need one?)  Conversely, anyone got one lying around they want to give, loan, or sell really cheap?

Shame on you Supermicro.  There are at least several things you could have done better here:

1. Make it much clearer that if buying a unit, it should include a legacy CPU first.  (I.e. that Ivy Bridge support is only for people upgrading CPUs and not building new systems.)

2. Have a separate SKU for systems that have the requisite BIOS changes.  Even if the silk-screen on the board is the same (and it shouldn't be, because the silk screening indicates only 1066 and 1333 MHz RAM supported, although the BIOS enables 1600 MHz), having the separate SKU lets buyers make sure they have the bits needed.

3. Work with retailers to ensure that existing stock is returned and upgraded as soon as possible.

4. Offer an expedited RMA process for people like me who are screwed in the meantime.  (To be fair, I've not called supermicro yet, but Googling tells me people have not had good luck with this option yet.  The only solution that has worked for people is to get their own 2nd processor.)

5. Technically, this board has an IPMI unit on it.  I think that means it has a separate microprocessor. Why can't I update the BIOS using that?  That would eliminate the need to buy a second CPU, at least for those of us with IPMI enabled gear.

Lessons for me:

1. Its probably worth a couple hundred bucks to let someone else put together the system for me.  I'm a driver guy -- futzing around with actual bits of silicon and cabling, its not my forte.

2. There's a reason it's called the bleeding edge.  I think I got cut with it today.

Sunday, June 3, 2012

Why I *hate* external dependencies!

So, I've been trying to setup a system that can build illumos-gate.  Because my old system had a memory DIMM fault, and isn't usable anymore.

I thought, hmm... let me try something other than OpenIndiana.

Big mistake.

Only OpenIndiana seems able to build vanilla illumos-gate right now.

Why?  Because of external dependencies!  I tried building on OmniOS.  The stopper there -- Python 2.4!  On SmartOS, the dependency is IPS itself.  It seems only OpenIndiana is suitable for building stock illumos-gate.

The problem is that our build is inherently very sensitive to the build environment.  It makes life incredibly unpleasant if you try to build on any system that is not configured exactly as we specified.

This, IMO, is an unacceptable situation.

I will speak loudly against any attempt to make changes that introduce further external dependencies.  The fact that some already exist is no excuse.  Every external dependency makes working on the tree more painful.  If you're thinking of a change that would move something out of tree, but add a build-server dependency, think again!  

These dependencies are a serious barrier for contribution.

We (illumos) should take a page from NetBSD, who have successfully arranged that their environment is self contained to the point that they don't care about their build server at all -- all it needs to be able to do is bootstrap gcc and run automake/autoconf.  There is something very elegant about this.

Expect to see me on yet another crusade to kill dependencies on external runtimes, be it Python, Perl, or whatever.   Our tree should, IMO, be as "stand-alone" as possible.

If that means you have to write C, so be it.  Grow up and get a real compiler dude.  I know languages like Python and Perl are attractive because they have a lower barrier to learn -- but depending on them -- and far worse -- depending on *specific* versions of them -- is toxic for anyone else who wants to work on the code in a different environment.   (This has nothing to do with your language of your choice, but everything to do with the pain and suffering your language choice creates for anyone who just wants to do "make install".)