Friday, July 30, 2010

Illumos

A number of the community leaders from the OpenSolaris community have been working quietly together on a new effort called Illumos, and we're just about ready to fully disclose our work to, and invite the general participation of, the general public.

We believe that everyone who is interested in OpenSolaris should be interested in what we have to say, and so we invite the entire OpenSolaris community to join us for a presentation on at 1PM EDT on August 3, 2010.

You can find out the full details of how to listen in to our conference, or attend in person (we will be announcing from New York City) by visiting http://www.illumos.org/announce (The final details shall be posted there not later than 1PM EDT Aug 1, 2010.)

We look forward to seeing you there!

- Garrett D'Amore & the rest of the Illumos Cast

Thursday, July 15, 2010

Please Be Patient

With all the ruckus surrounding Oracle's apparent abandonment of the community, and OGB's stated intention to suicide, the community uproar has been crazy.

Without giving any details, let me say that a few of us are quietly but diligently working on solutions to the critical problems, and I expect we'll be able to talk much more freely about the solutions we will be offering in early August, which is coming up very soon now.

So, I'm going to humbly ask folks to be patient -- hold your comments, complaints, and flames about Oracle and OpenSolaris and OGB in check please. If you can wait just a little bit longer, then I believe we'll be able to offer a more constructive outlet for your frustration and energies.

Thanks.

Wednesday, July 14, 2010

In NYC for DebConf10

I'll be attending DebConf10 (the Debian developer's conference) in NYC this year. Nexenta will be presenting information about our distribution. Its my hope that we can use this to generate more interest in OpenSolaris technology. If you're in NYC, and want to meet during the first week of August, let me know!

Wednesday, July 7, 2010

ZFS disk monitoring...

So I've posted this on zfs-discuss at opensolaris dot org, but its been suggested I mention it here too.

It turns out that the ZFS/FMA integration doesn't pick up on drive removals for most disk devices until the filesystem attempts to perform some I/O to the drive. This is rather unfortunate, because if a file system is not busy, you might suffer a loss of redundancy and not find out about it until too late.

It also means that you won't know about failures of hot spare devices until you need to put them into service, since by definition they are idle. (Note: as an exception running periodic scrubs should detect this too, although scrubs are highly intrusive to the overall I/O load on the system and probably should not be performed too often as a result.)

I'm told the Oracle 7000 series appliances have a solution for this problem, but of course the source for that is not in OpenSolaris. (Apparently there are quite a few differences in the core OS between the 7000 series and vanilla OpenSolaris -- unfortunately we can't know because -- unlike with NexentaStor -- we don't have access to the kernel source tree!)

This is not good for folks who use ZFS with ordinary Solaris 10 or OpenSolaris... or with derivatives such as NexentaStor.

To address that problem, I've developed a some code called "zfs-monitor" that periodically monitors the health of any physical vdev (disk) that is part of a ZFS pool (hot spare, log, or real device). This code is implemented as an FMA module. When a disk goes offline, zfs-monitor detects it, and triggers an FMA event, which allows ZFS to do the right thing. This means if a disk goes away, even if it isn't in use, whatever action is appropriate will be performed. (Logged in FMA fault logs, and if appropriate, a hot spare will be recruited to replace the failed or offline device.)

This code is part of NexentaStor 3.0.3. As there are some semantic differences of opinion (what constitutes device failure versus intentional removal by an administrator), the code is unlikely to be pushed into ON without further change. (At the same time, I've fixed a different problem in the ZFS FMRI parsing code, and I've submitted a request to get that fix integrated -- but I've not heard back from anyone at Oracle who is willing to sponsor the change yet.)

I'm happy to share the code for zfs-monitor to anyone who requests it. (In fact, you can examine the code in our open Mercurial repository directly!) Note that for it to work properly, you also will need the fix for the ZFS FMRI parsing bug just mentioned.

At Nexenta, we're committed to innovating and improving upon the great foundation of ZFS and OpenSolaris, and to the reasonable extent possible, we want to share those innovations with the greater OpenSolaris community. Hopefully changes like this demonstrate this commitment in a tangible fashion.