I know this is somewhat controversial for some people, but....
A member of my staff (Alex Stetsenko) has pushed an implementation of "beadm" that is in C. This is actually derived from an earlier C implementation we already had in tree, called "tbeadm", which we already had. So at some level, this consolidation of two different implementations into a single one. As part of this work, the tbeadm version was modernized and improved to provide i18n capabilities and to behave truly as a drop-in replacement for the python version.
As a result of this change, python is no longer needed at runtime by illumos for anything except IPS packaging. Sites and distributions which do not use IPS packaging (most distros don't, actually) no longer need to install python.
Thursday, December 30, 2010
Tuesday, December 21, 2010
Any illumos fans near Corinth, MS?
I drove across country last weekend, and am in Corinth, MS. I'd be happy to go out for a beer and some chat if there are any illumos fans or OSUGs nearby this week.
Wednesday, December 15, 2010
I sed(1) so!
I just integrated a new sed(1) port for illumos. This is derived from FreeBSD, but it includes a fix for a race condition, and support for translated messages. (FreeBSD friends, please feel free to include these changes back -- I've not changed the original BSD license.)
The legacy sed is gone.
This new sed should work on all the old sed scripts, but there were a few tricky parts that changed -- if you relied on parsing the output of the "l" command, or on the fact that legacy sed only treated content as a byte stream rather than multibyte characters, you might be affected.
Also, I've run into at least one sed script which was malformed, but mistakenly accepted by old sed, but which the new version doesn't accept (but instead gives you a meaningful error message.)
The biggest features in the new sed code:
Enjoy, and please report any problems in the illumos bug tracking system at http://illumos.org/. Thanks!
Update: Note that sed -i requires an argument (the extension) unlike GNU sed where the argument is optional. We can fix that, although this would make us less compatible with FreeBSD sed. (Specifically, it would make it nigh impossible to specify an "extension" starting with a dash.) If someone cares passionately about this, they should file a bug and bring it up on the developer list -- I am happy either way.
The legacy sed is gone.
This new sed should work on all the old sed scripts, but there were a few tricky parts that changed -- if you relied on parsing the output of the "l" command, or on the fact that legacy sed only treated content as a byte stream rather than multibyte characters, you might be affected.
Also, I've run into at least one sed script which was malformed, but mistakenly accepted by old sed, but which the new version doesn't accept (but instead gives you a meaningful error message.)
The biggest features in the new sed code:
- support for -i and -I
- support for -E to enable EREs
- much more helpful error messages ("command garbled" was just not very specific)
Enjoy, and please report any problems in the illumos bug tracking system at http://illumos.org/. Thanks!
Update: Note that sed -i requires an argument (the extension) unlike GNU sed where the argument is optional. We can fix that, although this would make us less compatible with FreeBSD sed. (Specifically, it would make it nigh impossible to specify an "extension" starting with a dash.) If someone cares passionately about this, they should file a bug and bring it up on the developer list -- I am happy either way.
Wednesday, December 8, 2010
Update on SATA Expanders
So we've done some more research, largely following up on work done by Richard Elling, and I have an update on the SAS/SATA expander problem. There is at least some good news here.
The problems that we've had in the past with these have centered around "reset storms", where a single reset expands into a great number of resets, and I/O throughput quickly diminishes to zero.
The problem is that when a reset occurs on an expander, it aborts any in-flight operations, and they fail. Unfortunately, the *way* in which they fail is to generate a generic "hardware error". The problem is that the sd(7d) driver's response to this is to ... issue another reset, in a futile effort to hopefully correct things.
Now the problem is that this behavior is also performed, by default, for media errors as well. E.g. if you have a disk that has a bad sector on it. Of course, if your disk is mostly idle, it won't be a problem. But if you have a lot of I/O going on, its going to result mostly in a melt-down.
There is good news though, because of the way LSI's drivers are designed.
The LSI mptsas driver at least (and I suspect mpt as well, though I don't have code to look at it) treats "bus-level" resets and "target-level" resets as the same. Both of them do a reset, which will of course reset the expander.
But we can disable the most pernicious reset in sd with the following line in sd.conf:
This will allow bus-wide resets to occur, but it will most specifically disable the reset in response to generic hardware and media errors. The relevant section of code in sd.c is this:
The savy folks here might notice that this is a wide setting, which is true. You can set it on a specific instance of sd, which requires more effort. There is also a better way to do this, by setting the reset_retry_count property to zero. However, setting the sd.conf property for that properly is considerably more complex, because of the byzantine syntax that sd uses to set up target-specific property values.
So, I still recommend avoiding these SATA expanders. But if you have no choice, then using this sd.conf tunable may be a reasonable workaround.
At the same time, I'm investigating the possibility of having this disabled by default for all of Nexenta's customers -- and possibly even in illumos. If you're a SCSI expert and have opinions on the matter, please let me know.
The problems that we've had in the past with these have centered around "reset storms", where a single reset expands into a great number of resets, and I/O throughput quickly diminishes to zero.
The problem is that when a reset occurs on an expander, it aborts any in-flight operations, and they fail. Unfortunately, the *way* in which they fail is to generate a generic "hardware error". The problem is that the sd(7d) driver's response to this is to ... issue another reset, in a futile effort to hopefully correct things.
Now the problem is that this behavior is also performed, by default, for media errors as well. E.g. if you have a disk that has a bad sector on it. Of course, if your disk is mostly idle, it won't be a problem. But if you have a lot of I/O going on, its going to result mostly in a melt-down.
There is good news though, because of the way LSI's drivers are designed.
The LSI mptsas driver at least (and I suspect mpt as well, though I don't have code to look at it) treats "bus-level" resets and "target-level" resets as the same. Both of them do a reset, which will of course reset the expander.
But we can disable the most pernicious reset in sd with the following line in sd.conf:
allow-bus-device-reset=0;
This will allow bus-wide resets to occur, but it will most specifically disable the reset in response to generic hardware and media errors. The relevant section of code in sd.c is this:
if ((un->un_reset_retry_count != 0) &&
(xp->xb_retry_count == un->un_reset_retry_count)) {
mutex_exit(SD_MUTEX(un));
/* Do NOT do a RESET_ALL here: too intrusive. (4112858) */
if (un->un_f_allow_bus_device_reset == TRUE) {
boolean_t try_resetting_target = B_TRUE;
/*
* We need to be able to handle specific ASC when we are
* handling a KEY_HARDWARE_ERROR. In particular
* taking the default action of resetting the target may
* not be the appropriate way to attempt recovery.
* Resetting a target because of a single LUN failure
* victimizes all LUNs on that target.
*
* This is true for the LSI arrays, if an LSI
* array controller returns an ASC of 0x84 (LUN Dead) we
* should trust it.
*/
if (sense_key == KEY_HARDWARE_ERROR) {
switch (asc) {
case 0x84:
if (SD_IS_LSI(un)) {
try_resetting_target = B_FALSE;
}
break;
default:
break;
}
}
if (try_resetting_target == B_TRUE) {
int reset_retval = 0;
if (un->un_f_lun_reset_enabled == TRUE) {
SD_TRACE(SD_LOG_IO_CORE, un,
"sd_sense_key_medium_or_hardware_"
"error: issuing RESET_LUN\n");
reset_retval =
scsi_reset(SD_ADDRESS(un),
RESET_LUN);
}
if (reset_retval == 0) {
SD_TRACE(SD_LOG_IO_CORE, un,
"sd_sense_key_medium_or_hardware_"
"error: issuing RESET_TARGET\n");
(void) scsi_reset(SD_ADDRESS(un),
RESET_TARGET);
}
}
}
The savy folks here might notice that this is a wide setting, which is true. You can set it on a specific instance of sd, which requires more effort. There is also a better way to do this, by setting the reset_retry_count property to zero. However, setting the sd.conf property for that properly is considerably more complex, because of the byzantine syntax that sd uses to set up target-specific property values.
So, I still recommend avoiding these SATA expanders. But if you have no choice, then using this sd.conf tunable may be a reasonable workaround.
At the same time, I'm investigating the possibility of having this disabled by default for all of Nexenta's customers -- and possibly even in illumos. If you're a SCSI expert and have opinions on the matter, please let me know.
Sunday, December 5, 2010
Status Update on the illumos Foundation
I sent this out earlier today:
We are working with the Software Freedom Law Center & Eben Moglen on the creation of the illumos legal entity. Nexenta have enlisted the help of Damien Eastwood (one of the more prominent former Sun lawyers) to help drive this. Jason Yoho at Nexenta is driving this fairly hard as well, so there are now more people than just me pushing this forward as quickly as we can.
We have set a goal that the legal entity (an illumos foundation) should exist with legal presence before the year's end. I'm told that this is an achievable goal.
I'll have more updates on this soon, I expect, but the process is moving forward.
We are working with the Software Freedom Law Center & Eben Moglen on the creation of the illumos legal entity. Nexenta have enlisted the help of Damien Eastwood (one of the more prominent former Sun lawyers) to help drive this. Jason Yoho at Nexenta is driving this fairly hard as well, so there are now more people than just me pushing this forward as quickly as we can.
We have set a goal that the legal entity (an illumos foundation) should exist with legal presence before the year's end. I'm told that this is an achievable goal.
I'll have more updates on this soon, I expect, but the process is moving forward.
New Advocate - Albert Lee
I'm pleased to announce the addition of Albert Lee (trisk@nexenta.com, aka Triskelios on IRC) to the list of Advocates who can approve integrations into illumos. Albert has been doing a lot of excellent work on both illumos and OpenIndiana, and I'm happy to expand the set of advocates we have available to include such a diligent and talented individual.
The current list of Advocates for illumos-gate are:
I'm hoping to expand the list to include more non-Nexenta-employees as well. If you're a contributor and would like to help out in this way, let me know. Typically becoming an advocate means you have earned the trust of the rest of the advocates by making several "good" integrations into illumos-gate (4-5 at least usually, although some credit is given for previous integration experience with ON at Sun/Oracle), and have a demonstrated level of thoroughness to help us ensure quality integrations.
Thanks, and again, congratulations and thank you to Albert.
The current list of Advocates for illumos-gate are:
- Garrett D'Amore
- Albert Lee
- Rich Lowe
- Gordon Ross
I'm hoping to expand the list to include more non-Nexenta-employees as well. If you're a contributor and would like to help out in this way, let me know. Typically becoming an advocate means you have earned the trust of the rest of the advocates by making several "good" integrations into illumos-gate (4-5 at least usually, although some credit is given for previous integration experience with ON at Sun/Oracle), and have a demonstrated level of thoroughness to help us ensure quality integrations.
Thanks, and again, congratulations and thank you to Albert.
Wednesday, December 1, 2010
New open source iprb(7D) driver
For a variety of byzantine reasons, the iprb driver has never been open sourced, even though everyone who's ever actually had anything to do with it agrees that it should be. (I blame the lawyers on this one...)
So I went ahead and reimplemented -- from scratch -- a new iprb driver. I'd certainly appreciate feedback on the code, which you can read in the webrev. I'm hoping to integrate this into illumos later this week.
So I went ahead and reimplemented -- from scratch -- a new iprb driver. I'd certainly appreciate feedback on the code, which you can read in the webrev. I'm hoping to integrate this into illumos later this week.
zfs should not depend on python... and doesn't anymore
As of the recent integration of a colleague of mine, illumos now has a zfs command that does not depend on python at all.
The zfs command is now entirely a C program. This may make it more friendly for use in other environments or platforms. FreeBSD folks, you might want to incorporate this into your tree. If you do, I'd sure like to know.
changeset: 13246:fe5d6e0b0bce
tag: tip
user: Alexander Stetsenko
date: Wed Dec 01 02:30:25 2010 +0300
description:
278 get rid zfs of python and pyzfs dependencies
Reviewed by: gordon.w.ross@gmail.com
Reviewed by: trisk@opensolaris.org
Reviewed by: alexander.r.eremin@gmail.com
Reviewed by: jerry.jelinek@joyent.com
Approved by: garrett@nexenta.com
The zfs command is now entirely a C program. This may make it more friendly for use in other environments or platforms. FreeBSD folks, you might want to incorporate this into your tree. If you do, I'd sure like to know.
Thursday, November 11, 2010
Job Opp @ Nexenta: Director of Sustaining/Certifications
We're looking for a Director of Engineering, to own sustaining (aka bug fixing) and hardware platform certifications (where partners provide a hardware platform and ask us to certify it for use with NexentaStor.)
The job qualifications:
To be clear, this will start out as a hands-on position, with a fast-paced startup environment. But the growth opportunities here are enormous. If you think you're up to this, please let me know.
The job qualifications:
- Must be local to the SF bay area (because the hardware lab is here)
- Must have strong communication skills
- Must be able to deal with stressful situations, and able to "manage" strong personalities
- Must have Solaris/OpenSolaris expertise... hands-on kernel work (crash dump analysis, coding, etc.)
- Desire experience with Storage protocols and products
- Desire expertise with x86 hardware
- Desire perl and/or python skills
To be clear, this will start out as a hands-on position, with a fast-paced startup environment. But the growth opportunities here are enormous. If you think you're up to this, please let me know.
Friday, November 5, 2010
New desktop image

Here's a sample of the new logo as a desktop image. I've not made this available publicly yet (mostly because I don't know how to capture this in a form that will include the gradient and post it where people can find it.) If someone with some gnome expertise on how to share this for others contacts me, I can work to make it available.
Wednesday, October 27, 2010
New illumos logo

Today at the OpenStorage Summit 2010, I unveiled the new illumos logo. We will be updating our branding, which also includes a new font, and other elements, over the next few weeks.
There were other updates on illumos that I covered in this talk. I think this was recorded, but I'm not sure right now where it was recorded and how to acces it. I'll be sure to share that when I find out.
Wednesday, October 13, 2010
CFV: web/HTML/graphics people
I have an urgent need to rennovate the illumos website. If you'd like to help the project out, and you have got both time and talent, please let me know. A major overhaul of the site is in order, and we need someone willing to dedicate some time on it. There may be some funds available for the right person, but to be clear, illumos can't afford the services of a professional design bureau.
New implementation of printf
So I finally got tired of waiting for someone else to do a printf(1) replacement in illumos for the closed binary from Oracle. I had thought this would be a trivial thing to do via ksh93/libcmd using a symbolic link ala /usr/bin/alias.
Lo and behold, it wasn't! Why? Because ksh93 printf insists (like all ksh93 builtins) on having -- and - getopt style processing. This is fundamentally incompatible with legacy printf. (Why does it do this? So it can dump its builtin man page, e.g. printf --man, to the console. A feature I've railed against in the past.)
Here's what should happen:
Here's what ksh93 does:
Now there is an argument to be made that a script which relies on the legacy behavior is fundamentally broken. But it doesn't matter -- the scripts are in the field (there are real examples of them), and the legacy behavior must be preserved. Breaking these legacy scripts just so that we can dump printf --version output is... silly. This is case where pragmatism wins over purity.
Rather than try to rip this out and fight with the ksh93 about "deviation from the upstream" (apparently the ksh93 folks view any changes we make in illumos or OpenSolaris as automatically toxic unless they originate from David Korn or Glenn Fowler), I've just gone ahead and implemented my own printf(1) on top of FreeBSD's. This will be the implementation in illumos.
I've added significantly to FreeBSD's code though. Specifically, I added handling of %n$ processing to get parameterized position handling. This is needed for internationalization -- it allows you to change the order of output as part of the output from something like gettext(1). (This is needed when you have to change word order to accommodate different natural language grammars.)
So my implementation is superior to FreeBSD's, and its superior to the legacy closed binary version. Why? Because rather than a half-hearted attempt at processing positional parameters, my version really handles these, including full support for the usual format specifiers. For example:
New open code:
Old closed code:
Clearly the old behavior is just plain wrong. For the record, ksh93 does the right thing here too. (Although somewhat older versions of ksh93 would dump core on this command line.)
My diffs (which also include style and lint fixes required for illumos) relative to FreeBSD are online. You can also review a webrev of the changes that I hope to integrate into illumos. The license remains BSD, so the various BSD operating systems (or even Oracle) are free to incorporate these improvements if they like.
Lo and behold, it wasn't! Why? Because ksh93 printf insists (like all ksh93 builtins) on having -- and - getopt style processing. This is fundamentally incompatible with legacy printf. (Why does it do this? So it can dump its builtin man page, e.g. printf --man, to the console. A feature I've railed against in the past.)
Here's what should happen:
% printf -v
-v%
Here's what ksh93 does:
garrett@thinkpad:~$ printf -v
ksh93: printf: -v: unknown option
Usage: printf [ options ] format [string ...]
Now there is an argument to be made that a script which relies on the legacy behavior is fundamentally broken. But it doesn't matter -- the scripts are in the field (there are real examples of them), and the legacy behavior must be preserved. Breaking these legacy scripts just so that we can dump printf --version output is... silly. This is case where pragmatism wins over purity.
Rather than try to rip this out and fight with the ksh93 about "deviation from the upstream" (apparently the ksh93 folks view any changes we make in illumos or OpenSolaris as automatically toxic unless they originate from David Korn or Glenn Fowler), I've just gone ahead and implemented my own printf(1) on top of FreeBSD's. This will be the implementation in illumos.
I've added significantly to FreeBSD's code though. Specifically, I added handling of %n$ processing to get parameterized position handling. This is needed for internationalization -- it allows you to change the order of output as part of the output from something like gettext(1). (This is needed when you have to change word order to accommodate different natural language grammars.)
So my implementation is superior to FreeBSD's, and its superior to the legacy closed binary version. Why? Because rather than a half-hearted attempt at processing positional parameters, my version really handles these, including full support for the usual format specifiers. For example:
New open code:
garrett@thinkpad{4}> printf '%2$1d %1$s\n' one 2 three 4
2 one
4 three
Old closed code:
garrett@master{22}> printf '%2$1d %1$s\n' one 2 three 4
134511600 one
Clearly the old behavior is just plain wrong. For the record, ksh93 does the right thing here too. (Although somewhat older versions of ksh93 would dump core on this command line.)
My diffs (which also include style and lint fixes required for illumos) relative to FreeBSD are online. You can also review a webrev of the changes that I hope to integrate into illumos. The license remains BSD, so the various BSD operating systems (or even Oracle) are free to incorporate these improvements if they like.
Friday, October 8, 2010
illumos gets global

I just pushed a major set of changes:
8 libc locale work needs updated license files
223 libc needs multibyte locale support for collation
225 libc locale binary files should be in native byte order
309 populate initial locales for illumos
As a result, illumos has gained base support for some 157 different locales, spanning 67 languages and 116 different territories. This includes nearly all the major languages of the world -- missing are Serbian, Javanese, Farsi, Malaysian, Burmese, and some languages spoken in central and west Africa. (Some of these will be very easy for someone else to add... let me know if you want one of these and are willing to do the work.)
The support for these locales includes full POSIX compliant collating support, which was completely absent in illumos before this integration.
Also, included, is a new open source implementation of localedef(1). To my knowledge, this new implementation is the only non-GNU version of localedef that is fully open, and this version is more fully functional than the GNU version. (The GNU localedef lacks full support for collation data.)
Other notes: this is only the base support for these locales. This will for example give localized output from "date". There is quite a lot of additional effort required to fully localize an illumos system, including support for input methods, fonts, and message catalogs for all the various applications. However, with this base support, it makes doing that other work much more practical.
This integration adds nearly 2 million lines to illumos, although far and away the vast majority of it is in the form of data from Unicode and the CLDR (common locale data repository). The ability to import data directly from these sources is the new code that I've written, including a major overhaul of the underlying ctype and collation support in libc to properly support multibyte locales.
Its my belief that with this integration, one of the biggest feature gaps between illumos and Solaris is closed.
Sunday, October 3, 2010
Emacs & Gnome Terminal Co-existence Resolved
For many years, I've been stuck with old xterm, because it was the only one that honored my Meta keys in the same way that GNU emacs did. I could never figure out how to make gnome-terminal work, which always bothered me somewhat. (Notably GNOME terminal has better Unicode support which has lately become important to me.)
I finally found a reference that helped me out. I understood that the problem was conflicting ideas about modifier keys; gnome-terminal uses Mod1, but Emacs uses Mod4. What I didn't know was something I found out here, namely that Emacs only uses Mod4 if it exists. So a better solution for me is to simply clear Mod4 altogether, and both programs happily honor Mod1. (This leaves xterm hosed, but if gnome-terminal works, then I don't need xterm anymore.)
My resulting .xmodmap looks like this:
This makes my PC keyboard behave sensibly. Alt is Meta. And Caps Lock is consigned to oblivion and the large key that used to have that function is now much more usefully assigned to Control.
I'm posting this here in case anyone else has struggled with this particular annoyance in the past. The clear Mod4 trick was the surprise ticket. (What I'd really like is a way to tell programs which Modifier is "really" the Meta key, given that the programs can't seem to agree on this. And with just one preference -- redefining the numerous bindings in emacs for each sequence, while possible, is not my idea of a fun thing to do.)
The other thing I'd like is a standard way in illumos/opensolaris to integrate .xmodmap. Linux/Ubuntu seems to detect my .xmodmap and handles it nicely.
I finally found a reference that helped me out. I understood that the problem was conflicting ideas about modifier keys; gnome-terminal uses Mod1, but Emacs uses Mod4. What I didn't know was something I found out here, namely that Emacs only uses Mod4 if it exists. So a better solution for me is to simply clear Mod4 altogether, and both programs happily honor Mod1. (This leaves xterm hosed, but if gnome-terminal works, then I don't need xterm anymore.)
My resulting .xmodmap looks like this:
remove Lock = Caps_Lock
keysym Caps_Lock = Control_L
add Control = Control_L
clear Mod4
This makes my PC keyboard behave sensibly. Alt is Meta. And Caps Lock is consigned to oblivion and the large key that used to have that function is now much more usefully assigned to Control.
I'm posting this here in case anyone else has struggled with this particular annoyance in the past. The clear Mod4 trick was the surprise ticket. (What I'd really like is a way to tell programs which Modifier is "really" the Meta key, given that the programs can't seem to agree on this. And with just one preference -- redefining the numerous bindings in emacs for each sequence, while possible, is not my idea of a fun thing to do.)
The other thing I'd like is a standard way in illumos/opensolaris to integrate .xmodmap. Linux/Ubuntu seems to detect my .xmodmap and handles it nicely.
Tuesday, September 28, 2010
Another ZFS departure
Jeff Bonwick is leaving Oracle.
This is a huge event, because Jeff has been one of the main innovators in operating system technology during his tenure at Sun. While you may know him best for ZFS, he's also the inventor of the slab allocator, which revolutionized memory management when it was created. (And now, pretty much every modern system uses some variation of the slab allocator.)
And he's not just an Oracle VP. Jeff has made integrations into Solaris' ZFS code base on an ongoing basis. This is a guy that has led with actual actions and innovation, backed by code, rather than some boffin who's risen to management and no longer contributes. At some level, he's the model for the kind of technologist I aspire to be.
With so many innovators leaving (and yes, there are other key players in flight), its going to be very interesting to see how Oracle is able to continue to be a thought leader in the OS technology that they've acquired.
One the one hand, its really a shame to see to much of the heart and soul of the Solaris engineer core slowly disintegrating.
On the other hand, I think illumos may be the place where Solaris innovation happens, more so than at Oracle, even sooner than I previously expected.
This is a huge event, because Jeff has been one of the main innovators in operating system technology during his tenure at Sun. While you may know him best for ZFS, he's also the inventor of the slab allocator, which revolutionized memory management when it was created. (And now, pretty much every modern system uses some variation of the slab allocator.)
And he's not just an Oracle VP. Jeff has made integrations into Solaris' ZFS code base on an ongoing basis. This is a guy that has led with actual actions and innovation, backed by code, rather than some boffin who's risen to management and no longer contributes. At some level, he's the model for the kind of technologist I aspire to be.
With so many innovators leaving (and yes, there are other key players in flight), its going to be very interesting to see how Oracle is able to continue to be a thought leader in the OS technology that they've acquired.
One the one hand, its really a shame to see to much of the heart and soul of the Solaris engineer core slowly disintegrating.
On the other hand, I think illumos may be the place where Solaris innovation happens, more so than at Oracle, even sooner than I previously expected.
Saturday, September 25, 2010
South/Central American opportunity
I just learned that a peer of mine is looking to add some escalation engineers in Latin America. Job requirements include excellent English, and the ability to deep dive into customer problems including kernel crash dump analysis and C coding ability. If this sounds interesting to you, please let me know.
Thursday, September 9, 2010
Oracle/NetApp ZFS lawsuit dismissed
Others have no doubt already picked upon this, but here it is anyway:
http://www.h-online.com/open/news/item/NetApp-and-Oracle-lift-ZFS-patent-cloud-1076313.html
Hopefully this is good news for downstream ZFS consumers.
http://www.h-online.com/open/news/item/NetApp-and-Oracle-lift-ZFS-patent-cloud-1076313.html
Hopefully this is good news for downstream ZFS consumers.
Tuesday, September 7, 2010
We're Hiring!
In case you didn't know, a number of companies are hiring illumos talent.
I know of an opening for a USB kernel engineer at one company.
I'm told Joyent is growing like crazy.
And Nexenta is hiring! In fact, here are some of the opportunities we have open at Nexenta:
I know of an opening for a USB kernel engineer at one company.
I'm told Joyent is growing like crazy.
And Nexenta is hiring! In fact, here are some of the opportunities we have open at Nexenta:
- QA leads. We have two positions for folks with skills and knowledge to design and build, and run, automated testing of the operating system, with a particular focus on storage and networking. Expertise in NFS, CIFS, iSCSI, ZFS, and the surrounding areas would be highly useful. Good communication skills, shell scripting or perl skills, and an ability to work in the office in Mountain View, are all required. Previous QA leadership preferred.
- Support engineers. We need support engineers across the globe. People who can answer the phone, and triage problems. Solaris or UNIX experience, ZFS clue, good troubleshooting and triage skills, and excellent communication skills are necessary.
- Kernel software engineers. I need people with deep TCP/IP, SCSI, Storage, and Filesystems expertise. Solaris expertise highly preferred, but can substitute FreeBSD or Linux kernel expertise. Highly motivated self-driven super-stars only.
- Sustaining software engineers. Excellent troubleshooting and kernel expertise is required. Expertise in one or more of TCP/IP, SCSI, storage, and filesystems is preferable. Solaris expertise highly preferred.
- IT staff. We have one opening for a mid-level IT engineer. Must be able to deal with Solaris, Linux, Windows, phones, and cantankerous development staff.
Friday, September 3, 2010
Squash-proof?
So everyone has heard me talk about the 800 lb. gorilla with respect to illumos.
One question I keep getting asked is, can the illumos project be "squashed" by this 800 lb. gorilla?
My stock answer had been "no". But I realized something today; I've been wrong.
The way illumos can be "killed" is if the corporate owner of Solaris were to do something to make illumos irrelevant. Like, say, opening Solaris back up (and in this case, I think they would probably need to go further open than they were before).
I'm not worried though. Even if that happens, illumos will have been a major success. But I really don't think it is going to happen.
One question I keep getting asked is, can the illumos project be "squashed" by this 800 lb. gorilla?
My stock answer had been "no". But I realized something today; I've been wrong.
The way illumos can be "killed" is if the corporate owner of Solaris were to do something to make illumos irrelevant. Like, say, opening Solaris back up (and in this case, I think they would probably need to go further open than they were before).
I'm not worried though. Even if that happens, illumos will have been a major success. But I really don't think it is going to happen.
Wednesday, September 1, 2010
illumos Interest Groups
So, I've been asked by several people who are involved with OpenSolaris User Groups around the world about illumos.
Given the clear demise of OpenSolaris, it seems to me at least, to be kind of silly to continue to meet using that name.
Some groups have reverted to pure Solaris usage. Which is fine for those groups that want to focus on Oracle products and want to come under the Oracle umbrella that it has for user groups.
For groups that are more interested in open technology, perhaps it is time to start up some "illumos interest groups" (IIGs)? (Calling them "User Groups" at this point seems rather premature... I think there are only a very few of us that are actually "using" illumos at this point.. but I hope that number to grow very much very soon. :-)
Btw, are there any folks interested in illumos in either Riverside County or North San Diego County? (California) I'd be interested in participating in an interest group if there was one that didn't require me to drive over an hour to get to.
Given the clear demise of OpenSolaris, it seems to me at least, to be kind of silly to continue to meet using that name.
Some groups have reverted to pure Solaris usage. Which is fine for those groups that want to focus on Oracle products and want to come under the Oracle umbrella that it has for user groups.
For groups that are more interested in open technology, perhaps it is time to start up some "illumos interest groups" (IIGs)? (Calling them "User Groups" at this point seems rather premature... I think there are only a very few of us that are actually "using" illumos at this point.. but I hope that number to grow very much very soon. :-)
Btw, are there any folks interested in illumos in either Riverside County or North San Diego County? (California) I'd be interested in participating in an interest group if there was one that didn't require me to drive over an hour to get to.
OpenSolaris ARC is Dead
I had tried to dial in to ARC today, but no luck. But then someone else pointed out that we have not seen any ARC cases since the tap was turned off.
In fact, I posted a query about this to the opensolaris-arc mailing list today, and I got back an interesting automated reply:
So, OpenSolaris ARC is dead. This has ramifications that go beyond just ON. Because there are other consolidations that we were promised were going to continue to be developed in the open: JDS, X11, and the pkg-gate. If the decisions for these technologies are no longer being made openly, or even the opinions being made available, then this makes Oracle's promise to continue to work with the community on them seem hollow.
So, what's left for "OpenSolaris" as so named? There are some code drops still being made. How long will that keep up? Are they continuing to take contribution from external parties? (I don't work on those gates, so I don't really know.) I'd like to know if the other consolidations have shut down too. At least the key decisions relating to those consolidations seem to have moved behind closed doors.
In fact, I posted a query about this to the opensolaris-arc mailing list today, and I got back an interesting automated reply:
This mailing list is no longer active and accepting posts. Mailing
list archives can be found at
http://mail.opensolaris.org/pipermail/opensolaris-arc/. You can check
http://mail.opensolaris.org/mailman/listinfo to find another list to
which to send your email.
So, OpenSolaris ARC is dead. This has ramifications that go beyond just ON. Because there are other consolidations that we were promised were going to continue to be developed in the open: JDS, X11, and the pkg-gate. If the decisions for these technologies are no longer being made openly, or even the opinions being made available, then this makes Oracle's promise to continue to work with the community on them seem hollow.
So, what's left for "OpenSolaris" as so named? There are some code drops still being made. How long will that keep up? Are they continuing to take contribution from external parties? (I don't work on those gates, so I don't really know.) I'd like to know if the other consolidations have shut down too. At least the key decisions relating to those consolidations seem to have moved behind closed doors.
Monday, August 23, 2010
OGB has dissolved today
The old OpenSolaris Governing Board has dissolved unanimously today.
The OpenSolaris governance is now in default, and returns to Oracle's hands.
For folks upset by this, let me remind them of Illumos. Its a sad note for OpenSolaris, but I think the reborn Illumos community will be better than the OpenSolaris community ever could be.
I do want to thank the (former) OGB members for their efforts, even if they did prove to be in vain.
Sunday, August 22, 2010
Why SAS->SATA is not such a great idea
So, we've had some "issue" reports relating to the mpt driver. In almost all cases, the results are related to situations where people are using SATA drives, and hooking them into SAS configurations.
Although the technology is supposed to work, and sometimes it works well, its a bad idea.
Let me elaborate:
- SAS drives are generally subjected to more rigorous quality controls. This is the main reason why they cost more. (That and the market will pay more.)
- SAS to SATA conversion technologies involve a significant level of protocol conversion. While the electricals may be the same, the protocols are quite different.
- Such conversion technology is generally done in hardware, where only the hardware manufacturer has a chance of debugging problems when they occur.
- Some of these hardware implementations remove debugging information that would be present in the SATA packet, and just supply "generic" undebuggable data in the SCSI (SAS) error return.
- The conversion technology creates another potential point of failure.
- Some of these hardware implementations won't be upgradeable, or at least not easily upgradeable, with software.
- SATA drives won't have a SCSI GUID (ATA specs don't require it), and so the fabricated GUID (created by the SAS converter) may be different when you move the drive to a different chassis, potentially breaking things that rely on having a stable GUID for the drive.
Don't get me wrong. For many uses, SATA drives are great. They're great when you need low cost storage, and when you are connecting to a system that is purely SATA (such as to an AHCI controller), there is no reason to be concerned.
But building a system that relies upon complex protocol conversion in hardware, just adds another level of complexity. And complexity is evil. (KISS).
So if you want enterprise SAS storage, then go ahead and spring for the extra cost of drives that are natively SAS. Goofing around with the hybrid SAS/SATA options is just penny wise, and pound foolish.
But hey, its your data. I just know that I won't be putting my trusted data in a configuration that is effectively undebuggable.
(Note: the above is my own personal opinion, and should not be construed as an official statement from Nexenta.)
Aug 30, 2010: Update: At a significant account, I can say that we (meaning Nexenta) have verified that SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic. So, if you're designing an enterprise storage solution, please consider using SAS all the way to the disk drives, and just skip those cheaper SATA options. You may think SATA looks like a bargain, but when your array goes offline during ZFS scrub or resilver operations because the expander is choking on cache sync commands, you'll really wish you had spent the extra cash up front. Really.
IPS == FAIL
Look, I really, really wanted to avoid entering the packaging debate. I mean, its an emotional decision, right?
Well, its supposed to be.
Except that I've spent nearly an entire day trying to figure out how to onu the latest illumos gate (which includes Rich Lowe's b147 merged in). I have gate changes that I desperately need to test in the context of a full install. (Well, I could say "screw it", and just test the bits in place -- which I've already done, but that's hardly a complete test.) I can't test them. Because I can't figure out how to use the packaging system to install them. And neither can our resident IPS expert, Rich Lowe.
This is no longer an emotional decision for me. Yeah, there are a lot of "emotional" things not to like about IPS. (It forces a dependency upon Python; its still immature; it seems to fail if you are disconnected from the network; it doesn't seem possible to build and install "just" a single package; apparently there are a lot of magic incantations that nobody outside of the IPS developers really understands; etc.) I was willing to set aside all those "emotional" responses and use IPS, if it worked. If for no other reason than the fact that it did away with BFU I have been willing to give it my best effort. But the latest situation has left me dead in the water, and apparently NO ONE can help me.
Look, I'm not a complete moron. (Well, maybe you disagree with me, but this is my blog.)
I should be able to make this work. If I cannot, then what kind of barrier is this going to create for participation from other people? Is Rich Lowe going to hold the hands of everyone else to get past these issues?
What happens the next time the pkg folks introduce another flag day?
This is unacceptable.
I'd like to hear other solutions. At the moment, I'm very very seriously considering gutting the IPS build requirements and having illumos go back to building SVR4 packages natively, using a tool to convert IPS meta data. (So meta data would be IPS, but binary deliverable would be auto-generated SVR4 packages.)
The current situation reminds me of Linus' comments about CVS. I feel the same way about IPS right now. I'm very angry ... the tools that are supposed to facilitate development have caused it to cease for me. If the only way for me to move forward is to reinvent SVR4 build systems, then that's what I'll do.
IPS is a failed science experiment. I don't see how it is going to get widespread adoption from anyone (ISVs or otherwise) with it as it stands today.
Flames to /dev/null. Let me know if you have a solution though.
Update: Rich was finally able to get me to the point of working. Although I can't ever downgrade. After what I just went through, I never want to. I'm really terrified that nobody really understands the steps it took to get me to a working state, and I am unwilling to force others to go through the same nightmare. So I'm still made at IPS, and I still think we need to unhitch the illumos cart from it.
Thursday, August 19, 2010
The Tap Is Turned Off
A little birdie told me that the last update to Oracles hg repository for ON was this one:
changeset: 13149:b23a4dab3d50tag: tipuser: Sukumar Swaminathandate: Wed Aug 18 15:52:48 2010 -0600description:6973228 Cannot download firmware 2.103.x.x on Emulex FCoE HBAs6960289 fiber side of emulex cna does not connect to the storage6950462 Emulex HBA permanently DESTROYED, if the firmware upgrade is interrupted6964513 COMSTAR - Emulex LP9002 fail to return a SCSI Inquiry correctly to a VMware 4 Initiator
From here on out, Illumos and Oracle Solaris diverge. The funny thing is, based on the calls I've had today, I could hardly be more optimistic about the future of illumos and the code base that was formerly called Solaris. Even more talent is getting behind this effort every day.
I'm very very excited... frankly Oracle shutting down the tap just really opened up the opportunity for us to really start innovating, in ways that I would have been loathe to do if we were still trying to maintain a very closely aligned source tree.
I think its entirely possible that Oracle may wind up viewing Illumos as the upstream rather than the reverse!
More milestones...
Illumos milestones reached today.
a) I pushed a working tr, and was able to build illumos on a system running illumos. This is the first time this has been possible.
b) Richlowe pushed a merge to build 147. There are probably consequences for developers (more updates required for bits that are not part of ON) -- stay tuned for updates about that.
All in all, things are moving quickly.
Tuesday, August 17, 2010
Presenting Illumos at SVOSUG
I'm pleased to announce that I'll be giving a brief talk at this month's SVOSUG meeting, Thursday Aug 26, at 6:45 pm in Mountain View. It will cover Illumos, and I will be joined by a colleague who will talk a bit more about Nexenta as well. If you're in the Bay Area at that time, it would be great to have a chance to meet.
I expect there will be some (probably significant) consumption of alcoholic beverages after the meeting, at an as yet undetermined location.
Subscribe to:
Posts (Atom)