Monday, June 4, 2012

Ivy Bridge Motherboards -- Don't *Really* Exist!

So, if you're like me, and you've been shopping for a new system lately because you're last one is dying or dead, you might see some fancy looking Intel E3-1200v2 Xeon systems.  (Why Xeon?  Because ECC memory is a good thing.  I attribute some of my difficulty with my last system to using non-ECC memory.  Because its not ECC, I don't really know whether the problem was in RAM or CPU, but I digress...)

Great, so you pick out a E3-1240v2 Ivy Bridge CPU, and then you look for a System Board.

I found the Supermicro X9SCA-F-O.  Looks like a sweet board.  The spec claims support for the E3-1200v2 cpus, and 1600MHz RAM.  It has a separate IPMI LAN support, as well.  Sweet!  (Look closer though....)

So, package arrives from NewEgg, and you're ready to go.

Assemble the kit.... if you're a software guy like me this might take a couple of hours.

Power it on.

Beep!  Beep!  Beep!  Beep!

WTF does that mean!?  Google "4 beeps supermicro".

Oh, "Unsupported processor."  Further reading....

The system needs a BIOS update (to v2.0) in this case.  Seems reasonable.

Download the instructions....

"Step 1. First, boot the motherboard with an E3-1200 series (not E3-1200 V2)
processor..."

WTF!?!  A new E3-1200 (that's Sandy Bridge mind you, still fairly recent stuff) processor is going to cost me like 200 bucks, minimum.  And I can't return it after I buy it except to exchange it with a different CPU.  But I already have the E3-1240v2.

Anyone want to buy a very lightly used E3-1220 from me?  It looks like I'll be purchasing one tomorrow. Ultimately, I can't afford not to burn the $200, because I simply don't have the time to keep screwing around this stuff.  I've burned the better part of today trying to wrangle hardware.  (Hmmm.... where's a Systems Engineer when you need one?)  Conversely, anyone got one lying around they want to give, loan, or sell really cheap?

Shame on you Supermicro.  There are at least several things you could have done better here:

1. Make it much clearer that if buying a unit, it should include a legacy CPU first.  (I.e. that Ivy Bridge support is only for people upgrading CPUs and not building new systems.)

2. Have a separate SKU for systems that have the requisite BIOS changes.  Even if the silk-screen on the board is the same (and it shouldn't be, because the silk screening indicates only 1066 and 1333 MHz RAM supported, although the BIOS enables 1600 MHz), having the separate SKU lets buyers make sure they have the bits needed.

3. Work with retailers to ensure that existing stock is returned and upgraded as soon as possible.

4. Offer an expedited RMA process for people like me who are screwed in the meantime.  (To be fair, I've not called supermicro yet, but Googling tells me people have not had good luck with this option yet.  The only solution that has worked for people is to get their own 2nd processor.)

5. Technically, this board has an IPMI unit on it.  I think that means it has a separate microprocessor. Why can't I update the BIOS using that?  That would eliminate the need to buy a second CPU, at least for those of us with IPMI enabled gear.

Lessons for me:

1. Its probably worth a couple hundred bucks to let someone else put together the system for me.  I'm a driver guy -- futzing around with actual bits of silicon and cabling, its not my forte.

2. There's a reason it's called the bleeding edge.  I think I got cut with it today.

Sunday, June 3, 2012

Why I *hate* external dependencies!

So, I've been trying to setup a system that can build illumos-gate.  Because my old system had a memory DIMM fault, and isn't usable anymore.

I thought, hmm... let me try something other than OpenIndiana.

Big mistake.

Only OpenIndiana seems able to build vanilla illumos-gate right now.

Why?  Because of external dependencies!  I tried building on OmniOS.  The stopper there -- Python 2.4!  On SmartOS, the dependency is IPS itself.  It seems only OpenIndiana is suitable for building stock illumos-gate.

The problem is that our build is inherently very sensitive to the build environment.  It makes life incredibly unpleasant if you try to build on any system that is not configured exactly as we specified.

This, IMO, is an unacceptable situation.

I will speak loudly against any attempt to make changes that introduce further external dependencies.  The fact that some already exist is no excuse.  Every external dependency makes working on the tree more painful.  If you're thinking of a change that would move something out of tree, but add a build-server dependency, think again!  

These dependencies are a serious barrier for contribution.

We (illumos) should take a page from NetBSD, who have successfully arranged that their environment is self contained to the point that they don't care about their build server at all -- all it needs to be able to do is bootstrap gcc and run automake/autoconf.  There is something very elegant about this.

Expect to see me on yet another crusade to kill dependencies on external runtimes, be it Python, Perl, or whatever.   Our tree should, IMO, be as "stand-alone" as possible.

If that means you have to write C, so be it.  Grow up and get a real compiler dude.  I know languages like Python and Perl are attractive because they have a lower barrier to learn -- but depending on them -- and far worse -- depending on *specific* versions of them -- is toxic for anyone else who wants to work on the code in a different environment.   (This has nothing to do with your language of your choice, but everything to do with the pain and suffering your language choice creates for anyone who just wants to do "make install".)

Tuesday, October 25, 2011

illumos hackathon a resounding success

Yesterday we had our first ever illumos hack-a-thon. About a dozen people from the community showed up, and worked on some very very cool things. At the end of the day, about a half-dozen different demos were done, to show what was worked on.

We'll be posting photos shortly. But here are some of the projects that got attention:
  • tab completion for dcmds and data types in mdb
  • ::print for DTrace
  • expanded truss support for ZFS ioctls (nvlist expansion)
  • time-ordered output for DTrace
  • a comment field in the vdev label for ZFS pool devices
  • the ability to change the GUID of a ZFS pool
Several other projects were in progress.

We selected these projects out of a much larger list of project proposals (which I'll post soon), based on the what people thought was most useful, and the ability to achieve results in a single day hack-a-thon. (And what people we willing to either work on, or mentor.)

Most importantly, people got to work on areas that they weren't intimately familiar with, and work with mentors who were much more familiar. For my own part, I had a blast working with George Wilson to implement the ability to alter the GUID of a pool (which is going to have some further uses we can talk about later). My webrev of these changes is now on-line, so I welcome review feedback.

We had domain experts like Adam Leventhal, Matt Ahrens, Eric Schrock, George Wilson, Gordon Ross, and Dan McDonald on hand, to help mentor projects if they needed it, and the amount of cross fertilization was amazing.

While none of the actual code changes are absolutely earth-shattering, they are still very very cool, useful things, that many of us will be happy to have available. I'm already imagining several useful use cases for the ability to reguid a pool, for example. And I can't wait until I have ::print in DTrace and tab completion in mdb. These will make my life significantly better.

Ultimately, this was the most enjoyable coding experience I've had in a long time. I can't wait until we do it again!

Friday, July 29, 2011

NexentaStor 3.1 available now



After a long, and arduous, release cycle, I am pleased to report that NexentaStor 3.1 is available now.

Customers running existing 3.0 installations may upgrade at no cost.

This release includes a number of key features, including some significant improvements for performance and manageability.
Folks using SCSI target mode will probably see the biggest performance boost relative to earlier editions of NexentaStor, especially those folks using NexentaStor to serve up storage to VMware guests -- thanks to the VAAI offload support that is part of this release. And for the record, yes, this release includes the fix the long standing problem with iSCSI timeouts.

For ZFS fans, this release also includes the updates for ZFS version 28. (This does mean that folks upgrading need to be cautious -- their pools will not be automatically updated to ZFS version 28, but if they are manually updated then there will be no way to move those pools back to an older release. That also means that pools should not be updated in HA cluster configurations unless both cluster partners are updated to NexentaStor 3.1 first.) This also means faster snapshot creation and deletion, and the ability to import pools in read only mode or that have encountered a failure of their SLOG device, making some disaster recovery scenarios much less challenging.

Nexenta Core Platform (our general purpose Debian based open core) will be updated shortly to 3.1 as well -- probably sometime in the next week or so. This will be the same core software as supplied with NexentaStor. (We do strongly encourage folks using Nexenta Core Platform for serving up storage to use our commercial NexentaStor product. There is even a free edition for users with up to 18TB of data.)

We are already (and have been for a while actually) hard at work on the next release, which will be based upon illumos, and include a number of other innovative features. Stay tuned for an update.


Thursday, July 28, 2011

Bombproof taskqs

As part of fixing some recent bugs, I integrated the following into illumos:

734 taskq_dispatch_prealloc() desired
943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP

The interesting one is the first of these. The interface is actually called taskq_dispatch_ent(), and is private to the "consolidation" (i.e. for use within bundled code only).

What this interface provides for, however, is a way to bomb-proof your taskq dispatches, if you can arrange for the dispatching state structure (taskq_ent_t) to be allocated in advance. This means you never have to worry about the possibility of a dispatch failing due to insufficient resources.

What's even cooler, is that the cost of the dispatching is much cheaper; taskq_dispatch() was the hottest piece of code on a very busy storage server. Now it goes much much faster, because it is just twiddling some linked list pointers and sending a signal to wake up the thread processing the taskq.

More importantly, the fact that we don't ever have to sleep, or have an expensive call into the kernel memory subsystem, has solved some frustrating "stalls" in the storage subsystem.

I encourage developers working with taskqs in illumos to have a look at this interface. If you can use it, you will simplify your code, and shorten your run-paths. Both of which are good things. The only limitation: this interface is not available for dynamic taskqs. (Which makes sense since dynamic taskqs might need to allocate whole threads.)

Monday, June 13, 2011

illumos podcast

Constantin Gonzalez recently interviewed me for his "HELDENFfunk" podcast series, while I was in Amsterdam for our European User's Conference. Also interviewed were OpenIndiana founders Alasdair Lumsden and Andrzej Szeszo.

illumos Panel Discussion in SF Bay Area

All:

There will be a panel discussion about illumos as part of the San Fransisco and Silicon Valley OpenSolaris User's Group meeting tomorrow, June 15. I will be joining Bryan Cantrill and Adam Leventhal to answer your questions about illumos.

The doors open at 6:45pm, and we will continue until about 8pm.

The location is 275 Middlefield #50, Menlo Park, California. Hope to see you there!