Monday, August 23, 2010

OGB has dissolved today

The old OpenSolaris Governing Board has dissolved unanimously today.

The OpenSolaris governance is now in default, and returns to Oracle's hands.

For folks upset by this, let me remind them of Illumos. Its a sad note for OpenSolaris, but I think the reborn Illumos community will be better than the OpenSolaris community ever could be.

I do want to thank the (former) OGB members for their efforts, even if they did prove to be in vain.

Sunday, August 22, 2010

Why SAS->SATA is not such a great idea

So, we've had some "issue" reports relating to the mpt driver. In almost all cases, the results are related to situations where people are using SATA drives, and hooking them into SAS configurations.

Although the technology is supposed to work, and sometimes it works well, its a bad idea.

Let me elaborate:

  • SAS drives are generally subjected to more rigorous quality controls. This is the main reason why they cost more. (That and the market will pay more.)
  • SAS to SATA conversion technologies involve a significant level of protocol conversion. While the electricals may be the same, the protocols are quite different.
  • Such conversion technology is generally done in hardware, where only the hardware manufacturer has a chance of debugging problems when they occur.
  • Some of these hardware implementations remove debugging information that would be present in the SATA packet, and just supply "generic" undebuggable data in the SCSI (SAS) error return.
  • The conversion technology creates another potential point of failure.
  • Some of these hardware implementations won't be upgradeable, or at least not easily upgradeable, with software.
  • SATA drives won't have a SCSI GUID (ATA specs don't require it), and so the fabricated GUID (created by the SAS converter) may be different when you move the drive to a different chassis, potentially breaking things that rely on having a stable GUID for the drive.

Don't get me wrong. For many uses, SATA drives are great. They're great when you need low cost storage, and when you are connecting to a system that is purely SATA (such as to an AHCI controller), there is no reason to be concerned.

But building a system that relies upon complex protocol conversion in hardware, just adds another level of complexity. And complexity is evil. (KISS).

So if you want enterprise SAS storage, then go ahead and spring for the extra cost of drives that are natively SAS. Goofing around with the hybrid SAS/SATA options is just penny wise, and pound foolish.

But hey, its your data. I just know that I won't be putting my trusted data in a configuration that is effectively undebuggable.

(Note: the above is my own personal opinion, and should not be construed as an official statement from Nexenta.)

Aug 30, 2010: Update: At a significant account, I can say that we (meaning Nexenta) have verified that SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic. So, if you're designing an enterprise storage solution, please consider using SAS all the way to the disk drives, and just skip those cheaper SATA options. You may think SATA looks like a bargain, but when your array goes offline during ZFS scrub or resilver operations because the expander is choking on cache sync commands, you'll really wish you had spent the extra cash up front. Really.


Look, I really, really wanted to avoid entering the packaging debate. I mean, its an emotional decision, right?

Well, its supposed to be.

Except that I've spent nearly an entire day trying to figure out how to onu the latest illumos gate (which includes Rich Lowe's b147 merged in). I have gate changes that I desperately need to test in the context of a full install. (Well, I could say "screw it", and just test the bits in place -- which I've already done, but that's hardly a complete test.) I can't test them. Because I can't figure out how to use the packaging system to install them. And neither can our resident IPS expert, Rich Lowe.

This is no longer an emotional decision for me. Yeah, there are a lot of "emotional" things not to like about IPS. (It forces a dependency upon Python; its still immature; it seems to fail if you are disconnected from the network; it doesn't seem possible to build and install "just" a single package; apparently there are a lot of magic incantations that nobody outside of the IPS developers really understands; etc.) I was willing to set aside all those "emotional" responses and use IPS, if it worked. If for no other reason than the fact that it did away with BFU I have been willing to give it my best effort. But the latest situation has left me dead in the water, and apparently NO ONE can help me.

Look, I'm not a complete moron. (Well, maybe you disagree with me, but this is my blog.)

I should be able to make this work. If I cannot, then what kind of barrier is this going to create for participation from other people? Is Rich Lowe going to hold the hands of everyone else to get past these issues?

What happens the next time the pkg folks introduce another flag day?

This is unacceptable.

I'd like to hear other solutions. At the moment, I'm very very seriously considering gutting the IPS build requirements and having illumos go back to building SVR4 packages natively, using a tool to convert IPS meta data. (So meta data would be IPS, but binary deliverable would be auto-generated SVR4 packages.)

The current situation reminds me of Linus' comments about CVS. I feel the same way about IPS right now. I'm very angry ... the tools that are supposed to facilitate development have caused it to cease for me. If the only way for me to move forward is to reinvent SVR4 build systems, then that's what I'll do.

IPS is a failed science experiment. I don't see how it is going to get widespread adoption from anyone (ISVs or otherwise) with it as it stands today.

Flames to /dev/null. Let me know if you have a solution though.

Update: Rich was finally able to get me to the point of working. Although I can't ever downgrade. After what I just went through, I never want to. I'm really terrified that nobody really understands the steps it took to get me to a working state, and I am unwilling to force others to go through the same nightmare. So I'm still made at IPS, and I still think we need to unhitch the illumos cart from it.

Thursday, August 19, 2010

The Tap Is Turned Off

A little birdie told me that the last update to Oracles hg repository for ON was this one:

changeset: 13149:b23a4dab3d50
tag: tip
user: Sukumar Swaminathan
date: Wed Aug 18 15:52:48 2010 -0600
6973228 Cannot download firmware 2.103.x.x on Emulex FCoE HBAs
6960289 fiber side of emulex cna does not connect to the storage
6950462 Emulex HBA permanently DESTROYED, if the firmware upgrade is interrupted
6964513 COMSTAR - Emulex LP9002 fail to return a SCSI Inquiry correctly to a VMware 4 Initiator

From here on out, Illumos and Oracle Solaris diverge. The funny thing is, based on the calls I've had today, I could hardly be more optimistic about the future of illumos and the code base that was formerly called Solaris. Even more talent is getting behind this effort every day.

I'm very very excited... frankly Oracle shutting down the tap just really opened up the opportunity for us to really start innovating, in ways that I would have been loathe to do if we were still trying to maintain a very closely aligned source tree.

I think its entirely possible that Oracle may wind up viewing Illumos as the upstream rather than the reverse!

More milestones...

Illumos milestones reached today.

a) I pushed a working tr, and was able to build illumos on a system running illumos. This is the first time this has been possible.

b) Richlowe pushed a merge to build 147. There are probably consequences for developers (more updates required for bits that are not part of ON) -- stay tuned for updates about that.

All in all, things are moving quickly.

Tuesday, August 17, 2010

Presenting Illumos at SVOSUG

I'm pleased to announce that I'll be giving a brief talk at this month's SVOSUG meeting, Thursday Aug 26, at 6:45 pm in Mountain View. It will cover Illumos, and I will be joined by a colleague who will talk a bit more about Nexenta as well. If you're in the Bay Area at that time, it would be great to have a chance to meet.

I expect there will be some (probably significant) consumption of alcoholic beverages after the meeting, at an as yet undetermined location.

Monday, August 16, 2010

More new stuff...

I've been pretty busy with Illumos lately, but last week I took a few days off for family time.

One of the things I did was take my son (9 years old) out to the Kern River to try some whitewater kayaking. This was his first time on moving water, and it amazed me how quickly he picked up basic concepts. He was doing ferries, peel outs, and eddy turns like a champ after about 20-30 minutes. Amazing. He didn't even swim his first day -- he elected to stay in his boat (actually trying to do a roll) until I could give him an Eskimo rescue. (His only swim that day was when he got flipped by one of the holes in Riverside Park.)

He did get a good swim on the second day, when we were working on ferries though the much faster swift water running at the bottom of Ewings rapid. His first ferry was quite high into the rapid itself, and clean, but the second time he went for a swim. Came up happy and smiling, ready to try again if we had had the time.

I wish I had some pictures.

Guess I'm gonna have to get the kid a boat soon. He wants to try kayak surfing with me, and he really wants to learn to roll. Too bad there are no vendors that offer whitewater boats small enough for kids in southern California. We probably won't make it to Kernville again until next season. :-(

Sunday, August 15, 2010

Milestone Commit for Illumos

Richard Lowe has just made a milestone change to the Illumos repository.

Its a milestone for two reasons:

a) It is the first commit from another developer other than me. (Other developers have code in progress, but not yet ready to commit, but soon!) This also makes it truly a community project, since Rich has no affiliation with me other than as a participant in the Illumos project.

b) It eliminates the dependency on the Oracle "extra" repository, which required folks to get a certificate to access non-redistributable code in order to build illumos.

Thank you very much Rich. I'm looking forward to more integrations from developers soon!

Friday, August 13, 2010

The Hand May Be Forced

Well, as you may have read, Oracle has decided that at some point very soon, we're going to lose normal regular access to the source code for OS/Net. (I.e. the Solaris kernel and supporting programs.)

While I would have vastly preferred for Illumos to have a cooperative and collaborative relationship with Oracle, it appears that Oracle doesn't value this. In fact, the exact words were from the management at Oracle were as follows:

Solaris is not something we outsource to others, it is not the assembly of someone else’s technology, and it is not a sustaining-only product.

While I understand the need to own the technology, there are few things that could be stated that show a stronger NIH attitude than this. Its unlikely that there will ever be a way for Oracle and the greater community to have a collaborative relationship.

This is a dark day for OpenSolaris -- its effectively dead now. (Its parent, Solaris, lives on however.)

How unfortunate.

For Oracle that is.

Because from the fertile ashes of the dead springs forth new life bringing hope and light in the form of Illumos.

Illumos has garnered the support of some of the top minds in the industry; already the list of names of Solaris contributors and potential contributors that have already publicly committed to supporting this project is extensive. Many of the names are famous, people like Bryan Cantrill. Oracle's actions and inaction have actually made this possible.

I can also say, the list goes even further -- considerably so. I have had private conversations with quite a few other people who have quietly committed to involvement. Some of the names are very surprising, and I hope that they will soon be in a position to announce their involvement for themselves. These are people that are big name contributors; folks who have made very large numbers of code commits to Solaris -- some of the deepest and most "challenging" parts of Solaris, too.

The upshot of this is that the future for Illumos is surprisingly bright. Rather than a dependency on the good will of one corporate sponsor with dubious intentions, the project will have the diverse backing of some of the most well-known innovators (and their employers) from the OpenSolaris -- nay, Open Source -- community.

So, by their actions here, Oracle may be forcing Illumos to "fork", which was always a prospect, even if not one I cherished. But with the backing of the innovators I know who are with us, I think we have a chance to actually be the premiere foundation for SunOS derived technology. Oracle may be investing more into Solaris, but if the best and brightest have left for greener pastures and are contributing to Illumos, then I think we'll have the "best" investments in the base. Following Oracle's lead when the brightest minds have already left looks less and less desirable by the moment. (And to be fair, there are still many bright folks within the Solaris organization at Oracle. But the balance is changing, and changing in favor of Illumos and the open development community.)

Oracle Solaris will not be the only source for this technology, and now it appears it may not even be the best source for this technology.

I once said I never intended for Illumos to compete with Solaris. That was true, but if Oracle forces the issue, then even despite their vast economic resources, I say, "Bring it!"

Tuesday, August 3, 2010

Illumos Announcement

Today we announced the Illumos Project. I think the call I gave on it had a lot more information than I want to write here, and there are now quite a number of blog postings from other more recognizable names than my own. I'm thrilled by the excitement here!