Tuesday, September 7, 2010

We're Hiring!

In case you didn't know, a number of companies are hiring illumos talent.

I know of an opening for a USB kernel engineer at one company.

I'm told Joyent is growing like crazy.

And Nexenta is hiring! In fact, here are some of the opportunities we have open at Nexenta:

  • QA leads. We have two positions for folks with skills and knowledge to design and build, and run, automated testing of the operating system, with a particular focus on storage and networking. Expertise in NFS, CIFS, iSCSI, ZFS, and the surrounding areas would be highly useful. Good communication skills, shell scripting or perl skills, and an ability to work in the office in Mountain View, are all required. Previous QA leadership preferred.
  • Support engineers. We need support engineers across the globe. People who can answer the phone, and triage problems. Solaris or UNIX experience, ZFS clue, good troubleshooting and triage skills, and excellent communication skills are necessary.
  • Kernel software engineers. I need people with deep TCP/IP, SCSI, Storage, and Filesystems expertise. Solaris expertise highly preferred, but can substitute FreeBSD or Linux kernel expertise. Highly motivated self-driven super-stars only.
  • Sustaining software engineers. Excellent troubleshooting and kernel expertise is required. Expertise in one or more of TCP/IP, SCSI, storage, and filesystems is preferable. Solaris expertise highly preferred.
  • IT staff. We have one opening for a mid-level IT engineer. Must be able to deal with Solaris, Linux, Windows, phones, and cantankerous development staff.
I expect even more growth will occur here over time. A jobs board for illumos will be coming soon.

Friday, September 3, 2010

Squash-proof?

So everyone has heard me talk about the 800 lb. gorilla with respect to illumos.

One question I keep getting asked is, can the illumos project be "squashed" by this 800 lb. gorilla?

My stock answer had been "no". But I realized something today; I've been wrong.

The way illumos can be "killed" is if the corporate owner of Solaris were to do something to make illumos irrelevant. Like, say, opening Solaris back up (and in this case, I think they would probably need to go further open than they were before).

I'm not worried though. Even if that happens, illumos will have been a major success. But I really don't think it is going to happen.

Wednesday, September 1, 2010

illumos Interest Groups

So, I've been asked by several people who are involved with OpenSolaris User Groups around the world about illumos.

Given the clear demise of OpenSolaris, it seems to me at least, to be kind of silly to continue to meet using that name.

Some groups have reverted to pure Solaris usage. Which is fine for those groups that want to focus on Oracle products and want to come under the Oracle umbrella that it has for user groups.

For groups that are more interested in open technology, perhaps it is time to start up some "illumos interest groups" (IIGs)? (Calling them "User Groups" at this point seems rather premature... I think there are only a very few of us that are actually "using" illumos at this point.. but I hope that number to grow very much very soon. :-)

Btw, are there any folks interested in illumos in either Riverside County or North San Diego County? (California) I'd be interested in participating in an interest group if there was one that didn't require me to drive over an hour to get to.

OpenSolaris ARC is Dead

I had tried to dial in to ARC today, but no luck. But then someone else pointed out that we have not seen any ARC cases since the tap was turned off.

In fact, I posted a query about this to the opensolaris-arc mailing list today, and I got back an interesting automated reply:

This mailing list is no longer active and accepting posts. Mailing
list archives can be found at
http://mail.opensolaris.org/pipermail/opensolaris-arc/. You can check
http://mail.opensolaris.org/mailman/listinfo to find another list to
which to send your email.


So, OpenSolaris ARC is dead. This has ramifications that go beyond just ON. Because there are other consolidations that we were promised were going to continue to be developed in the open: JDS, X11, and the pkg-gate. If the decisions for these technologies are no longer being made openly, or even the opinions being made available, then this makes Oracle's promise to continue to work with the community on them seem hollow.

So, what's left for "OpenSolaris" as so named? There are some code drops still being made. How long will that keep up? Are they continuing to take contribution from external parties? (I don't work on those gates, so I don't really know.) I'd like to know if the other consolidations have shut down too. At least the key decisions relating to those consolidations seem to have moved behind closed doors.

Monday, August 23, 2010

OGB has dissolved today

The old OpenSolaris Governing Board has dissolved unanimously today.

The OpenSolaris governance is now in default, and returns to Oracle's hands.

For folks upset by this, let me remind them of Illumos. Its a sad note for OpenSolaris, but I think the reborn Illumos community will be better than the OpenSolaris community ever could be.

I do want to thank the (former) OGB members for their efforts, even if they did prove to be in vain.

Sunday, August 22, 2010

Why SAS->SATA is not such a great idea

So, we've had some "issue" reports relating to the mpt driver. In almost all cases, the results are related to situations where people are using SATA drives, and hooking them into SAS configurations.

Although the technology is supposed to work, and sometimes it works well, its a bad idea.

Let me elaborate:

  • SAS drives are generally subjected to more rigorous quality controls. This is the main reason why they cost more. (That and the market will pay more.)
  • SAS to SATA conversion technologies involve a significant level of protocol conversion. While the electricals may be the same, the protocols are quite different.
  • Such conversion technology is generally done in hardware, where only the hardware manufacturer has a chance of debugging problems when they occur.
  • Some of these hardware implementations remove debugging information that would be present in the SATA packet, and just supply "generic" undebuggable data in the SCSI (SAS) error return.
  • The conversion technology creates another potential point of failure.
  • Some of these hardware implementations won't be upgradeable, or at least not easily upgradeable, with software.
  • SATA drives won't have a SCSI GUID (ATA specs don't require it), and so the fabricated GUID (created by the SAS converter) may be different when you move the drive to a different chassis, potentially breaking things that rely on having a stable GUID for the drive.

Don't get me wrong. For many uses, SATA drives are great. They're great when you need low cost storage, and when you are connecting to a system that is purely SATA (such as to an AHCI controller), there is no reason to be concerned.

But building a system that relies upon complex protocol conversion in hardware, just adds another level of complexity. And complexity is evil. (KISS).

So if you want enterprise SAS storage, then go ahead and spring for the extra cost of drives that are natively SAS. Goofing around with the hybrid SAS/SATA options is just penny wise, and pound foolish.

But hey, its your data. I just know that I won't be putting my trusted data in a configuration that is effectively undebuggable.

(Note: the above is my own personal opinion, and should not be construed as an official statement from Nexenta.)

Aug 30, 2010: Update: At a significant account, I can say that we (meaning Nexenta) have verified that SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic. So, if you're designing an enterprise storage solution, please consider using SAS all the way to the disk drives, and just skip those cheaper SATA options. You may think SATA looks like a bargain, but when your array goes offline during ZFS scrub or resilver operations because the expander is choking on cache sync commands, you'll really wish you had spent the extra cash up front. Really.

IPS == FAIL

Look, I really, really wanted to avoid entering the packaging debate. I mean, its an emotional decision, right?

Well, its supposed to be.

Except that I've spent nearly an entire day trying to figure out how to onu the latest illumos gate (which includes Rich Lowe's b147 merged in). I have gate changes that I desperately need to test in the context of a full install. (Well, I could say "screw it", and just test the bits in place -- which I've already done, but that's hardly a complete test.) I can't test them. Because I can't figure out how to use the packaging system to install them. And neither can our resident IPS expert, Rich Lowe.

This is no longer an emotional decision for me. Yeah, there are a lot of "emotional" things not to like about IPS. (It forces a dependency upon Python; its still immature; it seems to fail if you are disconnected from the network; it doesn't seem possible to build and install "just" a single package; apparently there are a lot of magic incantations that nobody outside of the IPS developers really understands; etc.) I was willing to set aside all those "emotional" responses and use IPS, if it worked. If for no other reason than the fact that it did away with BFU I have been willing to give it my best effort. But the latest situation has left me dead in the water, and apparently NO ONE can help me.

Look, I'm not a complete moron. (Well, maybe you disagree with me, but this is my blog.)

I should be able to make this work. If I cannot, then what kind of barrier is this going to create for participation from other people? Is Rich Lowe going to hold the hands of everyone else to get past these issues?

What happens the next time the pkg folks introduce another flag day?

This is unacceptable.

I'd like to hear other solutions. At the moment, I'm very very seriously considering gutting the IPS build requirements and having illumos go back to building SVR4 packages natively, using a tool to convert IPS meta data. (So meta data would be IPS, but binary deliverable would be auto-generated SVR4 packages.)

The current situation reminds me of Linus' comments about CVS. I feel the same way about IPS right now. I'm very angry ... the tools that are supposed to facilitate development have caused it to cease for me. If the only way for me to move forward is to reinvent SVR4 build systems, then that's what I'll do.

IPS is a failed science experiment. I don't see how it is going to get widespread adoption from anyone (ISVs or otherwise) with it as it stands today.

Flames to /dev/null. Let me know if you have a solution though.

Update: Rich was finally able to get me to the point of working. Although I can't ever downgrade. After what I just went through, I never want to. I'm really terrified that nobody really understands the steps it took to get me to a working state, and I am unwilling to force others to go through the same nightmare. So I'm still made at IPS, and I still think we need to unhitch the illumos cart from it.