Wednesday, September 1, 2010

OpenSolaris ARC is Dead

I had tried to dial in to ARC today, but no luck. But then someone else pointed out that we have not seen any ARC cases since the tap was turned off.

In fact, I posted a query about this to the opensolaris-arc mailing list today, and I got back an interesting automated reply:

This mailing list is no longer active and accepting posts. Mailing
list archives can be found at
http://mail.opensolaris.org/pipermail/opensolaris-arc/. You can check
http://mail.opensolaris.org/mailman/listinfo to find another list to
which to send your email.


So, OpenSolaris ARC is dead. This has ramifications that go beyond just ON. Because there are other consolidations that we were promised were going to continue to be developed in the open: JDS, X11, and the pkg-gate. If the decisions for these technologies are no longer being made openly, or even the opinions being made available, then this makes Oracle's promise to continue to work with the community on them seem hollow.

So, what's left for "OpenSolaris" as so named? There are some code drops still being made. How long will that keep up? Are they continuing to take contribution from external parties? (I don't work on those gates, so I don't really know.) I'd like to know if the other consolidations have shut down too. At least the key decisions relating to those consolidations seem to have moved behind closed doors.

Monday, August 23, 2010

OGB has dissolved today

The old OpenSolaris Governing Board has dissolved unanimously today.

The OpenSolaris governance is now in default, and returns to Oracle's hands.

For folks upset by this, let me remind them of Illumos. Its a sad note for OpenSolaris, but I think the reborn Illumos community will be better than the OpenSolaris community ever could be.

I do want to thank the (former) OGB members for their efforts, even if they did prove to be in vain.

Sunday, August 22, 2010

Why SAS->SATA is not such a great idea

So, we've had some "issue" reports relating to the mpt driver. In almost all cases, the results are related to situations where people are using SATA drives, and hooking them into SAS configurations.

Although the technology is supposed to work, and sometimes it works well, its a bad idea.

Let me elaborate:

  • SAS drives are generally subjected to more rigorous quality controls. This is the main reason why they cost more. (That and the market will pay more.)
  • SAS to SATA conversion technologies involve a significant level of protocol conversion. While the electricals may be the same, the protocols are quite different.
  • Such conversion technology is generally done in hardware, where only the hardware manufacturer has a chance of debugging problems when they occur.
  • Some of these hardware implementations remove debugging information that would be present in the SATA packet, and just supply "generic" undebuggable data in the SCSI (SAS) error return.
  • The conversion technology creates another potential point of failure.
  • Some of these hardware implementations won't be upgradeable, or at least not easily upgradeable, with software.
  • SATA drives won't have a SCSI GUID (ATA specs don't require it), and so the fabricated GUID (created by the SAS converter) may be different when you move the drive to a different chassis, potentially breaking things that rely on having a stable GUID for the drive.

Don't get me wrong. For many uses, SATA drives are great. They're great when you need low cost storage, and when you are connecting to a system that is purely SATA (such as to an AHCI controller), there is no reason to be concerned.

But building a system that relies upon complex protocol conversion in hardware, just adds another level of complexity. And complexity is evil. (KISS).

So if you want enterprise SAS storage, then go ahead and spring for the extra cost of drives that are natively SAS. Goofing around with the hybrid SAS/SATA options is just penny wise, and pound foolish.

But hey, its your data. I just know that I won't be putting my trusted data in a configuration that is effectively undebuggable.

(Note: the above is my own personal opinion, and should not be construed as an official statement from Nexenta.)

Aug 30, 2010: Update: At a significant account, I can say that we (meaning Nexenta) have verified that SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic. So, if you're designing an enterprise storage solution, please consider using SAS all the way to the disk drives, and just skip those cheaper SATA options. You may think SATA looks like a bargain, but when your array goes offline during ZFS scrub or resilver operations because the expander is choking on cache sync commands, you'll really wish you had spent the extra cash up front. Really.

IPS == FAIL

Look, I really, really wanted to avoid entering the packaging debate. I mean, its an emotional decision, right?

Well, its supposed to be.

Except that I've spent nearly an entire day trying to figure out how to onu the latest illumos gate (which includes Rich Lowe's b147 merged in). I have gate changes that I desperately need to test in the context of a full install. (Well, I could say "screw it", and just test the bits in place -- which I've already done, but that's hardly a complete test.) I can't test them. Because I can't figure out how to use the packaging system to install them. And neither can our resident IPS expert, Rich Lowe.

This is no longer an emotional decision for me. Yeah, there are a lot of "emotional" things not to like about IPS. (It forces a dependency upon Python; its still immature; it seems to fail if you are disconnected from the network; it doesn't seem possible to build and install "just" a single package; apparently there are a lot of magic incantations that nobody outside of the IPS developers really understands; etc.) I was willing to set aside all those "emotional" responses and use IPS, if it worked. If for no other reason than the fact that it did away with BFU I have been willing to give it my best effort. But the latest situation has left me dead in the water, and apparently NO ONE can help me.

Look, I'm not a complete moron. (Well, maybe you disagree with me, but this is my blog.)

I should be able to make this work. If I cannot, then what kind of barrier is this going to create for participation from other people? Is Rich Lowe going to hold the hands of everyone else to get past these issues?

What happens the next time the pkg folks introduce another flag day?

This is unacceptable.

I'd like to hear other solutions. At the moment, I'm very very seriously considering gutting the IPS build requirements and having illumos go back to building SVR4 packages natively, using a tool to convert IPS meta data. (So meta data would be IPS, but binary deliverable would be auto-generated SVR4 packages.)

The current situation reminds me of Linus' comments about CVS. I feel the same way about IPS right now. I'm very angry ... the tools that are supposed to facilitate development have caused it to cease for me. If the only way for me to move forward is to reinvent SVR4 build systems, then that's what I'll do.

IPS is a failed science experiment. I don't see how it is going to get widespread adoption from anyone (ISVs or otherwise) with it as it stands today.

Flames to /dev/null. Let me know if you have a solution though.

Update: Rich was finally able to get me to the point of working. Although I can't ever downgrade. After what I just went through, I never want to. I'm really terrified that nobody really understands the steps it took to get me to a working state, and I am unwilling to force others to go through the same nightmare. So I'm still made at IPS, and I still think we need to unhitch the illumos cart from it.

Thursday, August 19, 2010

The Tap Is Turned Off

A little birdie told me that the last update to Oracles hg repository for ON was this one:

changeset: 13149:b23a4dab3d50
tag: tip
user: Sukumar Swaminathan
date: Wed Aug 18 15:52:48 2010 -0600
description:
6973228 Cannot download firmware 2.103.x.x on Emulex FCoE HBAs
6960289 fiber side of emulex cna does not connect to the storage
6950462 Emulex HBA permanently DESTROYED, if the firmware upgrade is interrupted
6964513 COMSTAR - Emulex LP9002 fail to return a SCSI Inquiry correctly to a VMware 4 Initiator

From here on out, Illumos and Oracle Solaris diverge. The funny thing is, based on the calls I've had today, I could hardly be more optimistic about the future of illumos and the code base that was formerly called Solaris. Even more talent is getting behind this effort every day.

I'm very very excited... frankly Oracle shutting down the tap just really opened up the opportunity for us to really start innovating, in ways that I would have been loathe to do if we were still trying to maintain a very closely aligned source tree.

I think its entirely possible that Oracle may wind up viewing Illumos as the upstream rather than the reverse!

More milestones...

Illumos milestones reached today.

a) I pushed a working tr, and was able to build illumos on a system running illumos. This is the first time this has been possible.

b) Richlowe pushed a merge to build 147. There are probably consequences for developers (more updates required for bits that are not part of ON) -- stay tuned for updates about that.

All in all, things are moving quickly.

Tuesday, August 17, 2010

Presenting Illumos at SVOSUG

I'm pleased to announce that I'll be giving a brief talk at this month's SVOSUG meeting, Thursday Aug 26, at 6:45 pm in Mountain View. It will cover Illumos, and I will be joined by a colleague who will talk a bit more about Nexenta as well. If you're in the Bay Area at that time, it would be great to have a chance to meet.

I expect there will be some (probably significant) consumption of alcoholic beverages after the meeting, at an as yet undetermined location.