Wednesday, August 5, 2009

A Push A Long Time In Coming

I just pushed the changes associated with PSARC 2005/425, which is basically the removal of the script wrappers /usr/ucb/cc, /usr/ucb/lint, and /usr/ucb/ld. The fix will be in b122 of ON.

This is a good thing for people who build free software bits on OpenSolaris. For a long time, there have been complaints that /usr/ucb on your PATH was particularly toxic (some would say it was toxic anyway, but that's a different matter) -- autoconf scripts were frequently confounded by /usr/ucb/cc, which has done nothing more than generate a useless error message for most people.

So, /usr/ucb/ on your path, while not recommended, is at least no longer particularly toxic.

Monday, August 3, 2009

audiovia97 not as rare as I thought

So I was cleaning out my office.... (no jokes from the peanut gallery please!)

I was completely surprised to find that out of 5 old machines I was about to retire to the recycler, no fewer than three of them -- from totally different manufacturers -- one of them was Compaq Presario -- had integrated Via82c686 audio (which is supported by audiovia97.)

So if you've still got old Celeron's or PIII class systems around, there's a pretty good chance that you can get audio in OpenSolaris on them now.

Another way you know you're getting old...

When someone on facebook is confused and mixes up your H.S. graduation date with their birthday.

People born back when I was graduating high school are now graduates themselves.

Getting old sucks. But its better than the alternative.

Sunday, August 2, 2009

ohloh.net

I found an interesting site; it tracks commits by developer, and potentially ranks them, using a "kudos" system. There's also a way to give and to receive "kudos". It also has some source code analysis for projects, though I don't necessarily agree with all that it does. (A project that is 50% comments is not inherently better than a project that is 10% lines... you can't judge code quality or readability or structure based on comments.)

Anyway, I just joined and made it aware of who I am... I have a "kudo" rank of 7. If anyone else here is using this service, I'd be curious to know about it. My account information is here.

Thursday, July 30, 2009

Fast Reboot & Panic

I noticed that Sherry Moore just posted a blog entry about Fast Reboot.

I wanted to take a few moments to mention a few things, that I think folks should understand.

First off, the feature (fast reboot) is really useful -- for manually initiated reboots to perform administration (such as to reboot after installing new kernel bits or a critical patch), its wonderful to skip past the various hardware related initialization, and can really help with downtime costs associated with administrative maintenance tasks like patching.

This is especially true for systems with lots of peripheral buses (SCSI, Infiniband, etc.) that take a long time for low-level BIOS to probe and test. In such situations, BIOS initialization can consume several minutes. Reducing this to a few seconds is a compelling idea.

However, there are some gotchas that in my opinion people should be aware of when using the variant of this that gets used on panic().

  • During a panic situation, all bets are off about kernel or hardware state. (This is why the code in the kernel called panic() after all -- it has deemed it unsafe to proceed.)
  • So, the nice safe quiesce(9e) entry points are not guaranteed to be called. Hopefully they will be, but not necessarily.
  • Some drivers may panic when they find hardware in a state that is beyond their ability to recover. So quiesce(9e) may be functionally unable to put the system back into a sane state.
  • Some hardware simply can't restore properly without a low level PCI reset, which the current fast reboot code skips.
  • If hardware is not quiesce(9e)'d properly, on the reboot, the new kernel can wind up in a situation where a device might be randomly scribbling (via DMA) to physical memory (this can lead to arbitrary data corruption of either kernel or user pages of memory), or might wind up with a stuck interrupt (which may exhibit as a hard hang of the machine).

Note that the above situations are not theoretical. I have hit these problems, involving various different bits of hardware ... certain framebuffers that require low level initialization, certain Ethernet parts that don't have a functional software reset mechanism, and a certain WiFi controller that can leave interrupts stuck.

However, all the situations described above are also quite unlikely to occur. They probably occur in fewer than 1% of all potential panic scenarios.

The upshot of this is that I would most definitely not use fast reboot on any machine that is in production or which has critical data. Do you want to Its a wonderful feature for kernel developers who trash their systems all the time and are accustomed to taking risks -- for such uses the shortened reboot time on a panic is a net win compared to the potential risks (which is virtually nil -- if a system I'm testing hard hangs or crashes a second time I can always just power cycle it), but in a production environment the expectations are different.

In such an environment, you really don't expect to see panic() occur (if it does, you're already on the path of a bug!), but when it does, you want to be 100% certain that you confine the potential damage and get back to a known safe (and good) state. This is why we panic() and reboot, after all. Eliding those time consuming steps of low-level initialization might at first sound like an attractive way to get to higher uptimes, but if you analyze the situation carefully, its a potentially riskier proposition that could (admittedly unlikely) cause much greater downtime.

Now that you know the concerns, you are of course free to make your own assessment.

If you do want to turn off fast reboot on panic (and I do recommend that you leave regular fast reboot on, as far as I can see there is no downside to making an administratively requested reboot go faster), then you can just use the following commands (which are taken from Sherry's blog posting):
        
# svccfg -s "system/boot-config:default" \
setprop config/fastreboot_onpanic=false
# svcadm refresh svc:/system/boot-config:default

(Note that fast reboot on panic is enabled by default on OpenSolaris since build 112. However, since nobody should have deployed a system with this on in production -- we've had only development releases since then -- there is probably no urgency to go and immediately change your systems. Of course, that situation might be different if you're reading this blog post at some point in the future.)