I'm also looking for folks to review my mxfe driver. It is posted at http://cr.opensolaris.org/~gdamore/mxfe
Thanks!
Wednesday, June 20, 2007
hme code reviewers sought
I need to get code review coverage over the hme GLDv3 conversion. This is also a good chance to learn what it takes to convert a legacy DLPI driver to GLDv3.
If you can help out, please look at the code at http://cr.opensolaris.org/~gdamore/nemo-hme/
The sooner I can get quality code review and test coverage, the sooner we can put this back! :-)
If you can help out, please look at the code at http://cr.opensolaris.org/~gdamore/nemo-hme/
The sooner I can get quality code review and test coverage, the sooner we can put this back! :-)
The Need For Public Test Suites
Now that OpenSolaris is supposed to be "Open", the community needs a way to perform quality assurance tests so that community contributions do not block on Sun QA.
Currently, putback of changes to code to Solaris requires QA validation. For example, in order to putback my updated iprb and hme drivers (or my new mxfe driver), I have to get QA coverage. This means that I also have to get the time of someone from Sun, which can be challenging.
In order to free the community from Sun's grip, we have to have alternatives, so that community members can perform testing; giving the necessary quality assurance needed for (Open)Solaris, without blocking progress.
Hopefully someday soon the efforts of the folks who own the test suites to open them up will address this problem. For now, we just have to wait...
Currently, putback of changes to code to Solaris requires QA validation. For example, in order to putback my updated iprb and hme drivers (or my new mxfe driver), I have to get QA coverage. This means that I also have to get the time of someone from Sun, which can be challenging.
In order to free the community from Sun's grip, we have to have alternatives, so that community members can perform testing; giving the necessary quality assurance needed for (Open)Solaris, without blocking progress.
Hopefully someday soon the efforts of the folks who own the test suites to open them up will address this problem. For now, we just have to wait...
Monday, June 18, 2007
On life, the Universe, and everything...
Well, maybe not so much the Universe, as our own galaxy....
I recently came across a statement that there was an estimated 100 billion stars in our galaxy. I started to wonder, about the odds of us encountering sentient life in the galaxy, so I started running some rough calculations, just to estimate.
Astronomers estimate that approximately three out of four stars may harbor planets. (Basically, any unary system, plus any binary system where the companions orbit at least as far from one another as Pluto orbits our own star.) Again, these are rough estimates. So, maybe 75 billion planetary systems exist in our own galaxy!
For the moment, lets call the probability of a planetary system harboring a planet capable of supporting life is p. Lets call the probability of life developing on such a planet l. Lets call the probably of sentient life developing from more primitive forms (at any point, without regard to the time it takes) s. Further, lets assume that the average age of all stars in our galaxy is close to 5 billion years. And, lets assume that the time it takes for sentient (not necessarily civilized!) life to develop is close to what it took here on Earth, and that a sentient life form remains on the planet for about 100,000 years. (This is similar to the span of time that has been postulated since the first cavemen appeared here on earth.)
Then, we can guess that the number of planets which currently harbor sentient life in our galaxy to be expressed by:
75 * 10^9 * p * l * s * 100,000 / (5 * 10^9)
Simplifying terms:
1.5 * 10^6 * p * l * s
As probabilities p, l and s approach unity, we have approximately 1.5 million sentient species in the galaxy right now! (Regardless of whether they are space-faring or not.) This also ignores galaxies other than our own (there may be 100 billion such galaxies!)
Assume 1 per cent for each of these probabilities, and an entirely different picture comes up:
1.5 * .01 * .01 * .01 = 1.5 species in our galaxy right now.
The real question is, what are the values of p, l and s. Well, lets look at them:
Looking at our own Solar system, we have 8, 9, or more planets, and a number of moons. At least one of them supports life (Earth!) Its likely that at some point in its history, the conditions for supporting life were present on Mars, and it is also possible that conditions for supporting life may exist elsewhere just in our own system. (We are considering various moons, for example.) So it does not seem unrealistic to hypothesize a fairly large value for p. Let's randomly pick a value of .75.
The probability of l still seems a bit unclear. I certainly hope that we find it to be fairly large, but observationally we have not got any information other than our own planet. A sample size of 1 is too small to tell us anything. But, if we assume that other places in the galaxy are somewhat likely to have undergone similar processes as our planet did, the probably may be at least .25. (Again, a semi random value.)
With those values, we should have 1.5 million * .75 * .25 = 281,250 planets harboring life (of any kind) in our galaxy.
What is the value of s? Well, that's the biggest question. But even if it is quite small, say .01, then we still have a nice value of around 2000 different sentient species in our galaxy right now!
I'm really hopeful that now that we are getting the instrumentation to locate these extra solar planets, and even tell some things about them, such as their chemical makeups, etc. that we may be able to start finding evidence of these ... I hope that in the coming decade or two we will start sending out the first extra-solar probes to start are more direct observation of some of the other planetary systems. (We know that it costs quite a bit to build such a probe. But what is the incremental cost for additional probes? Could we send out 100, 1000, or even 10000 such probes to different systems?
I recently came across a statement that there was an estimated 100 billion stars in our galaxy. I started to wonder, about the odds of us encountering sentient life in the galaxy, so I started running some rough calculations, just to estimate.
Astronomers estimate that approximately three out of four stars may harbor planets. (Basically, any unary system, plus any binary system where the companions orbit at least as far from one another as Pluto orbits our own star.) Again, these are rough estimates. So, maybe 75 billion planetary systems exist in our own galaxy!
For the moment, lets call the probability of a planetary system harboring a planet capable of supporting life is p. Lets call the probability of life developing on such a planet l. Lets call the probably of sentient life developing from more primitive forms (at any point, without regard to the time it takes) s. Further, lets assume that the average age of all stars in our galaxy is close to 5 billion years. And, lets assume that the time it takes for sentient (not necessarily civilized!) life to develop is close to what it took here on Earth, and that a sentient life form remains on the planet for about 100,000 years. (This is similar to the span of time that has been postulated since the first cavemen appeared here on earth.)
Then, we can guess that the number of planets which currently harbor sentient life in our galaxy to be expressed by:
75 * 10^9 * p * l * s * 100,000 / (5 * 10^9)
Simplifying terms:
1.5 * 10^6 * p * l * s
As probabilities p, l and s approach unity, we have approximately 1.5 million sentient species in the galaxy right now! (Regardless of whether they are space-faring or not.) This also ignores galaxies other than our own (there may be 100 billion such galaxies!)
Assume 1 per cent for each of these probabilities, and an entirely different picture comes up:
1.5 * .01 * .01 * .01 = 1.5 species in our galaxy right now.
The real question is, what are the values of p, l and s. Well, lets look at them:
Looking at our own Solar system, we have 8, 9, or more planets, and a number of moons. At least one of them supports life (Earth!) Its likely that at some point in its history, the conditions for supporting life were present on Mars, and it is also possible that conditions for supporting life may exist elsewhere just in our own system. (We are considering various moons, for example.) So it does not seem unrealistic to hypothesize a fairly large value for p. Let's randomly pick a value of .75.
The probability of l still seems a bit unclear. I certainly hope that we find it to be fairly large, but observationally we have not got any information other than our own planet. A sample size of 1 is too small to tell us anything. But, if we assume that other places in the galaxy are somewhat likely to have undergone similar processes as our planet did, the probably may be at least .25. (Again, a semi random value.)
With those values, we should have 1.5 million * .75 * .25 = 281,250 planets harboring life (of any kind) in our galaxy.
What is the value of s? Well, that's the biggest question. But even if it is quite small, say .01, then we still have a nice value of around 2000 different sentient species in our galaxy right now!
I'm really hopeful that now that we are getting the instrumentation to locate these extra solar planets, and even tell some things about them, such as their chemical makeups, etc. that we may be able to start finding evidence of these ... I hope that in the coming decade or two we will start sending out the first extra-solar probes to start are more direct observation of some of the other planetary systems. (We know that it costs quite a bit to build such a probe. But what is the incremental cost for additional probes? Could we send out 100, 1000, or even 10000 such probes to different systems?
Thursday, June 7, 2007
eri GLDv3 and nemo fixes putback
In build 67, you'll find that eri(7D) is now a Nemo driver, with full support for IP instances, VLANs, trunking, etc.
As a consequence, you may have to fix scripts that do ndd /dev/eri since they now need to use /dev/eri0.
Nemo driver developers: you no longer need to syslog link status changes. In fact, please don't, because Nemo does it for you now.
Next up, hme, and (surprise!) iprb. (iprb was done last night on a bet.. Steve owes me a beer.)
As a consequence, you may have to fix scripts that do ndd /dev/eri since they now need to use /dev/eri0.
Nemo driver developers: you no longer need to syslog link status changes. In fact, please don't, because Nemo does it for you now.
Next up, hme, and (surprise!) iprb. (iprb was done last night on a bet.. Steve owes me a beer.)
Saturday, June 2, 2007
Is GLDv3/nemo conversion of legacy drivers worthwhile
That very question, specifically with regard to hme and qfe, but also some others, has come up lately. I'm of one mind (I think my position is clear by the very fact that I've invested effort here), but not everyone shares my opinion.
In order to have an on-line concrete resource I can point internal Sun naysayers at, I'm asking you to voice your thoughts here, by posting a follow up to this blog. (Sorry, no anonymous posts, but that means your posts will carry all that much more weight.)
Do you still use hme/qfe in systems? What about Sun Trunking with qfe? Would you upgrade to Nevada if there was GLDv3 support for these NICs? Would Nemo features in qfe/hme help you? Would a port of hme/qfe to x86 be useful to you?
Please post a follow up on my blog here, with your opinions!
In order to have an on-line concrete resource I can point internal Sun naysayers at, I'm asking you to voice your thoughts here, by posting a follow up to this blog. (Sorry, no anonymous posts, but that means your posts will carry all that much more weight.)
Do you still use hme/qfe in systems? What about Sun Trunking with qfe? Would you upgrade to Nevada if there was GLDv3 support for these NICs? Would Nemo features in qfe/hme help you? Would a port of hme/qfe to x86 be useful to you?
Please post a follow up on my blog here, with your opinions!
more network driver updates
I just completed my codereviews of eri, and got approval from the owners of the code to commit the changes. I expect that as a result the eri conversion to GLDv3 (plus major cleanups in the code) will be putback next week ... probably on Wednesday afternoon.
Why Wednesday? Well, I also need to commit PSARC 2007/298 and 2007/296, which the eri driver depends on. 2007/298 was the source of much debate lately, but I think a consensus has been achieved, and the changes should go in once I get the final blessing from PSARC (which at this point is pretty much a foregone conclusion.) Code reviews and testing have already been done.
I've also sent an mxfe card to Alan DuBoff, so he can run it through the NICDRV battery of tests. Hopefully as a result mxfe will be integrated soon. I'm anxious for Alan to commit afe (he has asked that I not take this over from him), so that I can quickly convert it to GLDv3 as well.
For some of the other legacy NICs (iprb, rtls, etc.) I've been asked to provide information about GLDv2 to v3 conversions, because apparently she wants to try her hand at converting at least one of them. Since these NICs are still common on PC motherboards, I applaud this effort.
As far as hme/qfe/ce go, more on that in a follow up.
Why Wednesday? Well, I also need to commit PSARC 2007/298 and 2007/296, which the eri driver depends on. 2007/298 was the source of much debate lately, but I think a consensus has been achieved, and the changes should go in once I get the final blessing from PSARC (which at this point is pretty much a foregone conclusion.) Code reviews and testing have already been done.
I've also sent an mxfe card to Alan DuBoff, so he can run it through the NICDRV battery of tests. Hopefully as a result mxfe will be integrated soon. I'm anxious for Alan to commit afe (he has asked that I not take this over from him), so that I can quickly convert it to GLDv3 as well.
For some of the other legacy NICs (iprb, rtls, etc.) I've been asked to provide information about GLDv2 to v3 conversions, because apparently she wants to try her hand at converting at least one of them. Since these NICs are still common on PC motherboards, I applaud this effort.
As far as hme/qfe/ce go, more on that in a follow up.
Friday, May 25, 2007
GLDv2->GLDv3 conversion notes
I was recently asked to provide some notes about GLDv2 to GLDv3 conversion for NIC drivers. Here's a rough draft of them. (This is cut-n-paste from mail I sent to an intern at Sun... I'm posting them here so the knowledge isn't lost.)
It is really helpful if you don't try to implement 100% of the features of GLDv3 in the first pass. (Some of the existing GLDv3 drivers, such as rge, nge, have incorrectly provided stubs for some functions, so don't use those as references.) Specifically I would not attempt to implement mac_resource allocation (MC_RESOURCES, etc.) or multiaddress support.Anyway, maybe those notes will save someone somewhere else some effort. Or inspire someone to pick up another driver and convert it.
You really do need to implement VLAN full frame sizes for MTU if your hardware can do it. Almost all NICs can do this. Sometimes the code to do it isn't in any Solaris driver. My preferred reference for alternate code sources is the NetBSD tree, which has an OpenGrok server for their code at http://opengrok.netbsd.org/ Ask me if you have any question about VLANs. It helps if you have a switch where you can test VLAN frames.
With GLDv3, there is no reset function, so you have to figure out how to put that in attach (for both DDI_ATTACH and DDI_RESUME!), or in the mac_start() function.
With GLDv3 the stats are quite different. Pay attention, and look at the headers to figure it out.
Many GLDv2 drivers don't do the mac_link_update() call. You should add those for NWAM, IPMP, and correct kstat reporting.
You should rip out any attempt to log to the console on link up, down, or carrier errors. See PSARC 2007/298 for details.
GLDv3 wants to operate on mblk_t's that are chained by b_next. Often you can use the old functions from a new function that just walks down, or builds up the list (depending on whether its receive or transmit.)
Pay careful attention to locking. Try not to call GLDv3 functions with locks held. (I have been pressing to allow mac_tx_update and mac_link_update be called with driver locks held. Right now its safe, but I can't seem to get a promise from the Nemo group yet.)
You see all those cyclics in some drivers? Try not to use 'em if you don't have to. What I usually do is cheat and use the on-chip timer if I need some kind of time driven functionality.
GLDv3 never explicitly initializes the physical addresses on the NIC. GLDv2 used to always call gldm_set_mac_addr()... so some drivers expect this. You may need to do that yourself in the mac_start() routine.
All these nics...
So I need to JumpStart a new system today... no problem, I'll just stick in a NIC and boot it with my etherboot PXE CDROM. No problem, right?
Well, lets see, first I need a NIC that supports Solaris. Inventorying what I have in my spare hardware today:
I guess I have a habit of collecting NICs.
Now, the Linksys boards are going to soon be supported by afe, if Alan ever gets his putback of my driver done. The Macronix board will be supported by mxfe later this week, once I get it reviewed and putback.
At one point I had a driver (pnic) sort of working for the LC82C169 (Lite-On PNIC), but I abandoned it because the PNIC was such a piece of crap, that I figured anyone with one of these was better off throwing it away and replacing it with another NIC (as long as it wasn't a Realtek 8139!) Maybe I'll revive that project one day. Probably not, since Lite-On didn't sell too many of them, I think. (The PNIC has some horrible hardware bugs, and the two major revisions, the 82C169 and 82C168, have quite different methods of handling 802.3u autonegotiation.)
I also started a driver for the Nat-Semi chip (nsfe), but abandoned it. I think this chip is also found in motherboards, where it is called an SiS part. I think Muryama also has a driver available for it.
I'd really like to see support for the others expanded upon. Maybe I need to look at dmfe, some more, because there really shouldn't be any reason it couldn't support x86 platforms. (D-Link sold a lot of DFE-530TX boads, IIRC.)
This also suggests that the elxl driver, which has been slated for EOF, really shouldn't be. One of the reasons I've kept that old NIC around was just because it was one of the few that was supported by Solaris 8 and earlier. I suspect I'm not the only one to have done this. I think the problem is that this driver is not open source. But open source variants exist... maybe someone should look at replacing elxl in Solaris Nevada with a FOSS replacment.
Some of these Muryama has already written drivers for. I would dearly like to see his vel in Solaris Nevada, along with conversion to GLDv3.
Well, lets see, first I need a NIC that supports Solaris. Inventorying what I have in my spare hardware today:
- Netgear GA311, rev A1 (RealTek 8169S-32, unsupported variant of rge)
- Netgear FA311, rev C1 (Nat-Semi DP83815D, unsupported)
- Netgear FA310TX, rev-D2 (Lite-On LC82C169, unsupported, see below)
- 3Com 3CR990 TX-97 (unsupported)
- D-Link 530TX rev A-1 (dmfe, no x86 support)
- Zyxel gigE (Via GbE chip, uncertain)
- Linksys LNE100TX v4.1 (unsupported, yet, see below)
- Linksys NC100 (unsupported, yet, see below)
- Macronix MX98715AEC (unsupported, yet, see below)
- Unbranded RTL8139B (supported, rtls, nevada only)
- 3Com 3C900-TX (supported, elxl, for now)
I guess I have a habit of collecting NICs.
Now, the Linksys boards are going to soon be supported by afe, if Alan ever gets his putback of my driver done. The Macronix board will be supported by mxfe later this week, once I get it reviewed and putback.
At one point I had a driver (pnic) sort of working for the LC82C169 (Lite-On PNIC), but I abandoned it because the PNIC was such a piece of crap, that I figured anyone with one of these was better off throwing it away and replacing it with another NIC (as long as it wasn't a Realtek 8139!) Maybe I'll revive that project one day. Probably not, since Lite-On didn't sell too many of them, I think. (The PNIC has some horrible hardware bugs, and the two major revisions, the 82C169 and 82C168, have quite different methods of handling 802.3u autonegotiation.)
I also started a driver for the Nat-Semi chip (nsfe), but abandoned it. I think this chip is also found in motherboards, where it is called an SiS part. I think Muryama also has a driver available for it.
I'd really like to see support for the others expanded upon. Maybe I need to look at dmfe, some more, because there really shouldn't be any reason it couldn't support x86 platforms. (D-Link sold a lot of DFE-530TX boads, IIRC.)
This also suggests that the elxl driver, which has been slated for EOF, really shouldn't be. One of the reasons I've kept that old NIC around was just because it was one of the few that was supported by Solaris 8 and earlier. I suspect I'm not the only one to have done this. I think the problem is that this driver is not open source. But open source variants exist... maybe someone should look at replacing elxl in Solaris Nevada with a FOSS replacment.
Some of these Muryama has already written drivers for. I would dearly like to see his vel in Solaris Nevada, along with conversion to GLDv3.
Tuesday, May 22, 2007
IP Instances, GLDv3, and mxfe
I recently decided that I wanted to create a zone with an exclusive IP instance, so that I could run IPsec (and specifically "punchin") in it. I have lots of NICs floating around, so I thought it would be trivial.
Turns out that all my NICs were GLDv2, and that IP instances requires GLDv3.
My solution? Conversion of mxfe (which were the cards I have in my system) to GLDv3. I figured it would be easier/faster than going out and buying a new Realtek card. And it would have been if not for one really annoying problem in mcopymsg() (see my previous post for that rant.)
Anyway, mxfe is humming away nicely now as a GLDv3 NIC on my system. I even got VLANs working with full MTU frames. Yay. I filed PSARC 2007/291 today, if you're interested in it. I'll post the driver sources up somewhere shortly.
(On another note, mxfe and afe are "suboptimal" drivers... they just blindly bcopy data, do nothing to reduce tx interrupts, and basically violate all the normal rules for making performant NIC drivers. But they work pretty well, for all that.)
Turns out that all my NICs were GLDv2, and that IP instances requires GLDv3.
My solution? Conversion of mxfe (which were the cards I have in my system) to GLDv3. I figured it would be easier/faster than going out and buying a new Realtek card. And it would have been if not for one really annoying problem in mcopymsg() (see my previous post for that rant.)
Anyway, mxfe is humming away nicely now as a GLDv3 NIC on my system. I even got VLANs working with full MTU frames. Yay. I filed PSARC 2007/291 today, if you're interested in it. I'll post the driver sources up somewhere shortly.
(On another note, mxfe and afe are "suboptimal" drivers... they just blindly bcopy data, do nothing to reduce tx interrupts, and basically violate all the normal rules for making performant NIC drivers. But they work pretty well, for all that.)
Why Side Effects Are Bad
This entry could just as easily been titled "Why Bad Documentation is Worse Than No Documentation".
I noticed that some functions from strsun.h are now part of the DDI. Great, I thought, I'll update my driver to use them as part of the general GLDv3 cleanup.
One major surprise, which I spent about 5-6 hours figuring out tonight. mcopymsg(9f) has a side effect that isn't documented!
Specifically, it does freemsg() on the buffer passed in.
Don't believe me? Check the source!
The manual page says nothing about this. And reading logically from the name, you'd not think it would do this. The side effect should never have been designed in, in the first place. But if the man page referenced this side effect, I might, just might have caught this problem a couple of hours ago.
In my particular case, it was causing hard hangs most of the time. Until I finally got a panic that pointed me into the root of the problem. (Yes, I probably should have set kmem_flags != 0. Next time.)
/me throws brick at whoever wrote and edited the man page.
/me throws pallet of bricks at whoever designed mcopymsg with this side effect in the first place
Arrgh. Well, this will probably help me figure out several problems I've run into lately.
I noticed that some functions from strsun.h are now part of the DDI. Great, I thought, I'll update my driver to use them as part of the general GLDv3 cleanup.
One major surprise, which I spent about 5-6 hours figuring out tonight. mcopymsg(9f) has a side effect that isn't documented!
Specifically, it does freemsg() on the buffer passed in.
Don't believe me? Check the source!
The manual page says nothing about this. And reading logically from the name, you'd not think it would do this. The side effect should never have been designed in, in the first place. But if the man page referenced this side effect, I might, just might have caught this problem a couple of hours ago.
In my particular case, it was causing hard hangs most of the time. Until I finally got a panic that pointed me into the root of the problem. (Yes, I probably should have set kmem_flags != 0. Next time.)
/me throws brick at whoever wrote and edited the man page.
/me throws pallet of bricks at whoever designed mcopymsg with this side effect in the first place
Arrgh. Well, this will probably help me figure out several problems I've run into lately.
Sunday, May 13, 2007
ZFS to the rescue!
So this weekend I had to do a system reinstall. Thankfully I had all my data on a pair of sata drives in a ZFS raidz. But I had to totally reinstall my system with a new motherboard, new SATA controller, etc.
I had to redo a bunch of things manually... NIS, passed, DHCP, etc.
The one thing that I didn't have to worry about: ZFS. I just plugged my SATA drives in, and did "zpool import -f data" (my dataset was called "data", which I could have figured out by just doing "zfs import" without options.)
That was it. One command only, and my raidz mirror was back in business, mounted in the right place, and even the right ZFS fileystems were NFS exported with the right options. Thank-you ZFS!
ZFS developers, I owe you a round, or three. Let me know if you want want to collect. :-)
I had to redo a bunch of things manually... NIS, passed, DHCP, etc.
The one thing that I didn't have to worry about: ZFS. I just plugged my SATA drives in, and did "zpool import -f data" (my dataset was called "data", which I could have figured out by just doing "zfs import" without options.)
That was it. One command only, and my raidz mirror was back in business, mounted in the right place, and even the right ZFS fileystems were NFS exported with the right options. Thank-you ZFS!
ZFS developers, I owe you a round, or three. Let me know if you want want to collect. :-)
Favorite things in OpenSolaris not in Solaris 10
A few things that I love about OpenSolaris, but that Solaris 10 lacks:
* Xorg default support for Intel GMA 950
* SATA ATAPI device support
* WPA (coming in build 64)
* NWAM (network auto-magic)
* DMFE is GLDv3
I'm not sure which of these will be coming to a Solaris 10 update in the future, but I can tell you I was immensely pleased with my upgrade from Solaris 10 update 4 (on a Intel mobo, with a Core 2 Duo cpu) to Solaris Nevada b62. As part of the deal, I switched to a SATA DVD drive; my system is now entirely SATA (no legacy PATA ribbon cables in the box!)
I'm sure there are lots of other useful features too, but at this point, I've put Nevada into "production" use at home, and I'm not looking back.
(And yes, I realize b62 isn't the latest, but I didn't have a copy of snv_63, and the machine I was "reinstalling" ... thanks to an unexpected mobo replacement, was my network server.)
* Xorg default support for Intel GMA 950
* SATA ATAPI device support
* WPA (coming in build 64)
* NWAM (network auto-magic)
* DMFE is GLDv3
I'm not sure which of these will be coming to a Solaris 10 update in the future, but I can tell you I was immensely pleased with my upgrade from Solaris 10 update 4 (on a Intel mobo, with a Core 2 Duo cpu) to Solaris Nevada b62. As part of the deal, I switched to a SATA DVD drive; my system is now entirely SATA (no legacy PATA ribbon cables in the box!)
I'm sure there are lots of other useful features too, but at this point, I've put Nevada into "production" use at home, and I'm not looking back.
(And yes, I realize b62 isn't the latest, but I didn't have a copy of snv_63, and the machine I was "reinstalling" ... thanks to an unexpected mobo replacement, was my network server.)
Learning to hate SMF
Some of you may recall my recent putback of the removal of in.tnamed.
Well, there has been some nasty fallout, thanks to SMF and the upgrade process. snv64 (which will have to be respun as a result of this nastiness) was hanging during upgrade, thanks to chicken and egg dependencies in the upgrade script.
I fixed the hang, but there is a warning message coming from inetd that I can't seem to locate.
Along the way, I've found references to the network/tname service in a few surprising places. The things I've had to edit, thanks to SMF:
usr/src/tools/scripts/bfu.sh
usr/src/pkgdefs/SUNWcsr/postinstall
usr/src/cmd/svc/profile/generic_net_limited.xml
usr/src/cmd/svc/prophist/prophist.SUNWcsr
usr/src/cmd/cmd-inet/usr.sbin/tname.xml (removed)
And *still* we see a warning from inetd. See 6556092 in bugster for more info.
Anyway, I'm waiting for folks to decide whether to allow the warning to stay, or to backout the change to remove in.tnamed. If the later is taken, I will run screaming from the process, and just leave in.tnamed alone.
In my opinion, removing the tname.xml should have been sufficient. But thanks to SMF's binary databases, it creates a major headache. Can someone from the SMF team please unravel this maze?
Well, there has been some nasty fallout, thanks to SMF and the upgrade process. snv64 (which will have to be respun as a result of this nastiness) was hanging during upgrade, thanks to chicken and egg dependencies in the upgrade script.
I fixed the hang, but there is a warning message coming from inetd that I can't seem to locate.
Along the way, I've found references to the network/tname service in a few surprising places. The things I've had to edit, thanks to SMF:
usr/src/tools/scripts/bfu.sh
usr/src/pkgdefs/SUNWcsr/postinstall
usr/src/cmd/svc/profile/generic_net_limited.xml
usr/src/cmd/svc/prophist/prophist.SUNWcsr
usr/src/cmd/cmd-inet/usr.sbin/tname.xml (removed)
And *still* we see a warning from inetd. See 6556092 in bugster for more info.
Anyway, I'm waiting for folks to decide whether to allow the warning to stay, or to backout the change to remove in.tnamed. If the later is taken, I will run screaming from the process, and just leave in.tnamed alone.
In my opinion, removing the tname.xml should have been sufficient. But thanks to SMF's binary databases, it creates a major headache. Can someone from the SMF team please unravel this maze?
Friday, May 4, 2007
LSI MegaRaid SAS driver (Thanks dlg!)
David Gwynne (dlg on #opensolaris) has created a very nice driver for LSI MegaRaid SAS controllers. You can find it here.
I have not got any hardware, so I've not tested it, but this driver is the model of simplicity and elegance for an HBA, from what I can tell, weighing in at only 1500 lines. A great deal of that is no doubt thanks to the simple model of the hardware, but the simplicity and elegance in the driver should be credited to David as well.
I'd like to sponsor this myself for integration into Nevada, but I haven't got any hardware. If you have hardware to loan for qualification testing, give me a shout, because this looks like a prime candidate for a Nevada integration.
I have not got any hardware, so I've not tested it, but this driver is the model of simplicity and elegance for an HBA, from what I can tell, weighing in at only 1500 lines. A great deal of that is no doubt thanks to the simple model of the hardware, but the simplicity and elegance in the driver should be credited to David as well.
I'd like to sponsor this myself for integration into Nevada, but I haven't got any hardware. If you have hardware to loan for qualification testing, give me a shout, because this looks like a prime candidate for a Nevada integration.
hme gldv3 status report
The conversion of hme to gldv3 looks like it is a success. The driver "Just Worked" from the first time I loaded it. Yay.
Still to be tested are the main areas of risk: VLANs, SUSPEND/RESUME, and DDI detach. Stay tuned for more on that front.
I'm going to have to preserve qfe as a seperate driver, I think, because renaming/renumbering devices is just going to cause too much grief in the field. But, what I'm going to do is make hme.c and qfe.c very small (say ~50 lines each), and have them use a common misc module to provide the entire functionality.
I have now received several qfe boards as well, so I'll be testing on x86 soon, as well.
Watch this space for the code review to be posted.
For the curious, some size comparisions:
gd78059@sr1-umpk-52{8}> wc pcic/usr/src/uts/sun/io/hme.c{,.orig}
6498 19423 171291 pcic/usr/src/uts/sun/io/hme.c
8889 26403 232421 pcic/usr/src/uts/sun/io/hme.c.orig
15387 45826 403712 total
size in the kernel (as reported by modinfo): old = 63384, new 47184
Still to be tested are the main areas of risk: VLANs, SUSPEND/RESUME, and DDI detach. Stay tuned for more on that front.
I'm going to have to preserve qfe as a seperate driver, I think, because renaming/renumbering devices is just going to cause too much grief in the field. But, what I'm going to do is make hme.c and qfe.c very small (say ~50 lines each), and have them use a common misc module to provide the entire functionality.
I have now received several qfe boards as well, so I'll be testing on x86 soon, as well.
Watch this space for the code review to be posted.
For the curious, some size comparisions:
gd78059@sr1-umpk-52{8}> wc pcic/usr/src/uts/sun/io/hme.c{,.orig}
6498 19423 171291 pcic/usr/src/uts/sun/io/hme.c
8889 26403 232421 pcic/usr/src/uts/sun/io/hme.c.orig
15387 45826 403712 total
size in the kernel (as reported by modinfo): old = 63384, new 47184
Wednesday, May 2, 2007
PSARC 2007/243 approved
Subject says it all. This is the eri conversion to nemo. There is more testing yet to be done, but note that this means that eri will inherit VLAN and link aggregation support. Neat, huh?
In the Bay Area this week
I'm up in MPK this week. (Wed, Thur, and leaving Friday night.) If any fellow Solaris geeks up here are up for a pub outing, e-mail me.
Monday, April 30, 2007
doing my part
I just committed a change to move 7 previously closed drivers in Nevada to the open source tree under usr/src. This change involved nothing more than Makefile and copyright block editing, so it was pretty much a no-brainer. (Though the heavy lifting of the legal review had already been done.)
The drivers moved were: bscv, bscbus, i2bsc, gptwo_cpu, gptwocfg, todstarcat, and todm5819p_rmc.
Admittedly, none of these are likely to exist on your hardware, but it does help to have more bits open. Hopefully someday /usr/closed will either cease to exist or become its own consolidation separate from Nevada.
The drivers moved were: bscv, bscbus, i2bsc, gptwo_cpu, gptwocfg, todstarcat, and todm5819p_rmc.
Admittedly, none of these are likely to exist on your hardware, but it does help to have more bits open. Hopefully someday /usr/closed will either cease to exist or become its own consolidation separate from Nevada.
Friday, April 27, 2007
eri conversion update
Hmm... it looks like I never posted the webrev... well here it is, the webrev for the eri(7d) conversion to Nemo.
Now, the second bit of good news here is that the PSARC case for this as been submitted as PSARC 2007/243. Note that the case isn't published publicly at the time of writing, but it should be soon.
Now, the second bit of good news here is that the PSARC case for this as been submitted as PSARC 2007/243. Note that the case isn't published publicly at the time of writing, but it should be soon.
Thursday, April 26, 2007
afe and dmfe cases approved
FYI, the afe and dmfe cases I had at PSARC (2007/229 and 2007/221 respectively) were approved. I've already put back the dmfe code. The afe code will be committed by Alan DuBoff. I've got pre-approval to do a follow-up putback to convert afe to GLDv3 afterwards.
Note that as a result of Crossbow, there are some changes coming in GLDv3, so it is still inappropriate to use GLDv3 for unbundled drivers. (The biggest of these changes is support for "polling", where the network stack can disable interrupts on the NIC and run a separate thread to poll the device for inbound packets. On extremely high traffic systems, this can have a big impact on overall system throughput by avoiding the extra context switches.)
Note that as a result of Crossbow, there are some changes coming in GLDv3, so it is still inappropriate to use GLDv3 for unbundled drivers. (The biggest of these changes is support for "polling", where the network stack can disable interrupts on the NIC and run a separate thread to poll the device for inbound packets. On extremely high traffic systems, this can have a big impact on overall system throughput by avoiding the extra context switches.)
Wednesday, April 25, 2007
afe PSARC case number
The PSARC fasttrack to integrate afe into Nevada was assigned case number PSARC 2007/229. Notably, this case was not submitted by me (I'm not even on the interest list!), and is being done as a result of the BSD license terms for afe. It will probably be reviewed at next week's PSARC meeting.
Death to IEN-116
Finally, over 20 years since the late Jon Postel said Death To IEN-116, we have finally removed it from OpenSolaris. Who says changes in Solaris take too long?
Sunday, April 22, 2007
eri tests look good.. call for more testers
As predicted, the area of biggest risk in my conversion of eri to GLDv3 was in fact the kstat handling. However, I appear to have that all worked out now, and the binary is working flawlessly on my SunBlade 100. Even suspend/resume works fine. However, I've not yet integrated this code properly into a workspace to generate a webrev, but I will do so soon. (Probably tomorrow... I'd like to get my two other RTIs put back first.)
One of the biggest concerns about this effort was the added risk that doing this conversion might bring to the "stable" eri driver. So, I'm asking the community for help. If you want to help out with testing, especially if you have higher end systems or want to do some benchmark comparisons, please let me know.
(I don't have specific test suites to give out that this time... its of more value frankly to have people using their own tests right now, that way we get broader test coverage than perhaps we might with a single test suite.)
Please let me know. Thanks! (Oh yeah, if you have an eri you want to try with new GLDv3-based 802.3ad link aggregation features, I'd be game for that, too!)
(PS. An obvious consequence of this effort is that it will be easy to do the work to convert hme, gem, and qfe, which share a lot common heritage with the eri driver. So, maybe there is yet hope for those, as well.)
One of the biggest concerns about this effort was the added risk that doing this conversion might bring to the "stable" eri driver. So, I'm asking the community for help. If you want to help out with testing, especially if you have higher end systems or want to do some benchmark comparisons, please let me know.
(I don't have specific test suites to give out that this time... its of more value frankly to have people using their own tests right now, that way we get broader test coverage than perhaps we might with a single test suite.)
Please let me know. Thanks! (Oh yeah, if you have an eri you want to try with new GLDv3-based 802.3ad link aggregation features, I'd be game for that, too!)
(PS. An obvious consequence of this effort is that it will be easy to do the work to convert hme, gem, and qfe, which share a lot common heritage with the eri driver. So, maybe there is yet hope for those, as well.)
Friday, April 20, 2007
GLDv3 experiences
I've just finished (still testing!) my port of eri to GLDv3. Between that and eri, and looking at existing GLDv3 drivers (bge, rge, e1000g), I think I have gathered some operational experience that I hope we can use to improve Nemo. (So, anyone who says my time spent on converting eri was wasted is wrong... because if nothing else it gained some more operational experience with GLDv3.)
Executive summary of the takeaways I have gotten so far, that I think are worth noting:
So here's the detailed stuff.
Executive summary of the takeaways I have gotten so far, that I think are worth noting:
- There is still a lot of code duplicated across even GLDv3 drivers (more below)
Lock management is so much simplified - GLDv3 kstats need "work"
- we really, really need Brussels... it can't come soon enough.
- some drivers can probably be changed internally to work even better with GLDv3 than a naive port
So here's the detailed stuff.
- code duplication
The duplicated code falls into three major areas. ioctls (mostly ndd(1M) and loopback handling for SunVTS), kstats, and MII. For now I want to focus on the MII bit. It turns out that pretty much every Ethernet device on the planet talks to a transceiver (whether integrated into the same chip as the MAC controller or not) using MII/GMII. We have tons of logic surrounding MII and GMII replicated across each driver, and frequently the decisions made by one driver are different than those in another.
There exists an old i386 driver called mii, which was an abortive attempt to create a common module/framework for MII and PHY handling. (Only used by the obsolete dnet driver at present.) I think this should be revived. Its been shown to work well for BSD Unix (at least NetBSD, but I'm pretty sure all of them), and it would really help simplify a lot of code. The eri driver, for example, probably has a couple thousand lines of MII related auto-negotiation logic in it.
And of course, each of these negotiation frameworks takes a slightly different set of tunables and configuration parameters, exports different statistics, etc. - Lock management is so much simplified
It's reallyeasy to write a GLDv3 driver that doesn't hold locks across GLDv3 routines. I suspect a lot of deadlocks/hangs/panics are going to be solved by moving drivers to GLDv3. (Of course, we've seen locking problems higher in the stack as a result... see recent deadlocks in dls, etc. But we only need to solve those once with GLDv3. Yay.) - The kstat framework for GLDv3 is just plain broken.
There are several problems here.- All kstats for a media type are included, regardless of whether or not they make sense for a specific device. For example, the cap_rem_fault is not supported by most of the drivers yet, but yet, when the driver doesn't have support in mac_stat(), the statistic is included in kstat output as 0. However, pretty much any system with an 802.3u compliant MII does in fact support the rem_fault MII field. So in this case, just because the driver isn't exporting the stat, the framework is creating an outright lie. This is probably true of other stats as well. For example, if hardware isn't prepared to report runt_errors, then it doesn't make sense to claim that value as "zero".... because you might be flooding the device with bad packets, which just get dropped on the floor (perhaps getting accounted in some other, less granular "BadPackets" counter or somesuch.) Better to say nothing than to tell a lie, IMO.
- kstat's are normally "snapshotted", so that you can take a snapshot of all stats in time at once. This is common with some hardware devices, too. Getting these stats may be expensive though. (For example reclaiming transmit buffers, so you can collect transmit status, etc. Acquiring locks. With some devices you might even have to do an expensive collection effort that would normally cv_wait for an interrupt.) Having to go through this several times (once for each stat collected) for a single snapshot is ... inefficient. It would be nice to add a mac_stat_update() entry point, which is separate from the mac_stat() entry point. (Even better, also add a mac_stat_done() to release any resources acquired by the first call.) The good news, I think, is that hopefully we aren't going to have to support DLPI DL_GET_STATISTICS_REQ, so it should be safe to cv_wait in mac_stat() related calls now (unlike with older GLDv2.) We aren't supporting the DLPI statistics calls, are we? Please say we aren't....
- If the driver wants to export any additional driver-specific statistics, it has to do the whole kstat dance itself, in addition to the nemo mac_stat() entry point. Lets try to find a way for drivers to export/register additional driver specific kstats within the existing nemo framework, please?
- Duplication. E.g. for bge, there is a "bge0" kstat, created by dls, as well as a "mac" kstat created by the mac module. Both of these will have some common counters, like ipackets64, brdcstxmt, etc. What's worse, one stat in particular, "unknowns" is counted by the dls framework in the "bge0" stat, but is not counted by the "mac" stat. This can lead to confusion. The duplication also makes worse the snapshot problem already mentioned, since it appears that most of the stats are generated just by calling the mac_stat() a second time for the same values already recorded in the "mac" kstat.
- Inadequate list of kstats in the default set. I found several kstat which were missing. We got several of them getting fixed as a result of PSARC 2007/220, but I've since found a few others. E.g. Ethernet devices commonly can detect "jabber timeouts". These should be reported somehow. Also, stats about network related interrupts are really important, and aren't included by default. I consider this a significant shortcoming. I guess devices should register a KSTAT_TYPE_INTR kstat, but approximately none of them do today.
- Stat cleanups in drivers. This is mostly a driver-specific problem, but look at the kstat output on bge and e1000g, and see what I'm talking about. There is a total lack of consistency here.
- All kstats for a media type are included, regardless of whether or not they make sense for a specific device. For example, the cap_rem_fault is not supported by most of the drivers yet, but yet, when the driver doesn't have support in mac_stat(), the statistic is included in kstat output as 0. However, pretty much any system with an 802.3u compliant MII does in fact support the rem_fault MII field. So in this case, just because the driver isn't exporting the stat, the framework is creating an outright lie. This is probably true of other stats as well. For example, if hardware isn't prepared to report runt_errors, then it doesn't make sense to claim that value as "zero".... because you might be flooding the device with bad packets, which just get dropped on the floor (perhaps getting accounted in some other, less granular "BadPackets" counter or somesuch.) Better to say nothing than to tell a lie, IMO.
- We really need Brussels.
From the above, you see the problems with kstats. There are similar problems with NDD. The amount of code scattered around different drivers trying to figure out NIC tuning is boggling. And most of it isn't what you'd call "sterling examples of quality". The eri driver was full of some really, really fragile code in this. (Deleting one tunable ... the instance ndd parameter... required updating no fewer than 4 different locations in the driver. And they weren't conveniently co-located.
Interpretation of values, handling, all of it is terribly replicated across so many drivers. I can't wait to eradicate this crufty, horrid code, and replace it with something nice and sane from Brussels. - Some drivers can change internally to work even better with GLDv3.
In eri, for example, I think we can be smart on the transmit side, so that, for example, when a group of mblks comes down, we don't kick the hardware and resync the descriptor rings until all the packets are queued for transmit. This would help amortize some per-packet expenses across multiple packets.
Other drivers can benefit from multiaddress support. dmfe falls into that category.
That said, my approach so far has been the naive conversion. I'd like to revisit a few of them to enhance them to take advantage of the superior design in GLDv3, but first I want to get them put back.
Wednesday, April 18, 2007
dmfe crossbow conversion
In case you ever wondered what it takes to convert a "simple" GLDv2 driver to Nemo, have a look at the webrev I posted earlier today.
I'm hoping that this work will get integrated soon. As an upshot, dmfe with this change "just works" with dladm show-dev.
I'm hoping that this work will get integrated soon. As an upshot, dmfe with this change "just works" with dladm show-dev.
report from the battery team
I'm now a member of the "battery team". I had a very productive con-call with the folks involved, and I think we are going to soon have a better common framework for battery APIs in the kernel so that SPARC systems can also take advantage of the gnome battery applet. Watch this space!
afe integration web rev posted
For the curious, I've posted a webrev containing the changes required to integrate afe into Nevada.
The driver includes changes from the stock AFE driver for Solaris, including some lint fixes, and changes to use the stock Solaris sys/miireg.h.
I'd love to make more changes to this driver, but at the moment I don't want to cause a test reset. Once the driver is integrated, I have a bunch more improvements coming... Nemo, multiple mac address support, VLAN support, link notification support (needed for NWAM), as well as code reduction by using some features that are now part of stock Solaris (like the common MII framework!)
The driver includes changes from the stock AFE driver for Solaris, including some lint fixes, and changes to use the stock Solaris sys/miireg.h.
I'd love to make more changes to this driver, but at the moment I don't want to cause a test reset. Once the driver is integrated, I have a bunch more improvements coming... Nemo, multiple mac address support, VLAN support, link notification support (needed for NWAM), as well as code reduction by using some features that are now part of stock Solaris (like the common MII framework!)
Thursday, April 12, 2007
Tadpole SPARCLE support putback
Core support for SPARCLE was just putback! I'm getting ready to post an initial tadpmu for public review soon, as well. This should make you SPARCLE/Sun Ultra 3 owners out there happy.
Wednesday, April 11, 2007
Not All Broadcom GigE's are Equal
Recently, I posted a blog entry where I described that "Not All GigE Are Equal", strongly advocating the use of Broadcom GigE devices when faced with a choice.
However, after spending time in the code, I've discovered that there is quite a range of differences amongst Broadcom gigE devices.
I had considered listing a full table of them, but it seems that would be a bit onerous. Take a look at usr/src/uts/common/io/bge/bge_chip2.c if you want to find out the gory details. But in the mean time, here are my recommendations:
If you have PCI or PCI-X: Choose a bcm5704 if you can. It has pretty much full feature support, but you need to pick a recent revision (newer than A0.) Look for pci ids of pci14e4,1646, pci14e4,16a8, or pci14e4,1649. These chips alls support PCI-X, multiple rings, full checksum offload, and multiple hardware tx and rx rings.
If you have PCIe: As far as I can tell, all of the PCIe chips that are Solaris supported lack support for multiple hardware tx/rx rings. This is really unfortunate, as it will have a negative impact on Crossbow benefits. But apart from that, it looks like the 5714 and 5714 series are your best bet. They both support jumbo frames, and they both have full checksum offload support. Look for pci ids of pci14e4,1668, pci14e4,1669, pci14e4,1678, or pci14e4,1679.
What this really says, is if you have to choose between a PCI-X card and a PCIe card, surprisingly, choose the PCI-X card (if you can get a 5704). Save your PCIe for framebuffers or HBAs. (Or, better, 10G cards like Neptune.)
However, after spending time in the code, I've discovered that there is quite a range of differences amongst Broadcom gigE devices.
I had considered listing a full table of them, but it seems that would be a bit onerous. Take a look at usr/src/uts/common/io/bge/bge_chip2.c if you want to find out the gory details. But in the mean time, here are my recommendations:
If you have PCI or PCI-X: Choose a bcm5704 if you can. It has pretty much full feature support, but you need to pick a recent revision (newer than A0.) Look for pci ids of pci14e4,1646, pci14e4,16a8, or pci14e4,1649. These chips alls support PCI-X, multiple rings, full checksum offload, and multiple hardware tx and rx rings.
If you have PCIe: As far as I can tell, all of the PCIe chips that are Solaris supported lack support for multiple hardware tx/rx rings. This is really unfortunate, as it will have a negative impact on Crossbow benefits. But apart from that, it looks like the 5714 and 5714 series are your best bet. They both support jumbo frames, and they both have full checksum offload support. Look for pci ids of pci14e4,1668, pci14e4,1669, pci14e4,1678, or pci14e4,1679.
What this really says, is if you have to choose between a PCI-X card and a PCIe card, surprisingly, choose the PCI-X card (if you can get a 5704). Save your PCIe for framebuffers or HBAs. (Or, better, 10G cards like Neptune.)
blogger Atom bugs
As part of setting up the Tadpole project, I tried to use a feed direct from Blogger, but the OpenSolaris tonic infrastructure doesn't like it. Apparently the feed has some problems, which you can see by looking at the output from feedvalidator. Anyway, I was able to work around by using feedburner to convert the blogger Atom feed into a clean RSS feed. Maybe at some point some Blogger staff will look at this and see what the problem is.
hackergotchi... thanks Gman!
Gman (Glynn) made a hackergotchi from a photo I sent him, which is used on planet.opensolaris.org. His gimp-fu is great. Thanks Gman!
Monday, April 9, 2007
Saturday, April 7, 2007
First Tadpole code review posted
The first review for Tadpole platform support is online now. Please let me know your thoughts, after reading it. There will be more good stuff coming soon, I hope. (Also, if you have a Tadpole platform other than a SPARCLE or UltraBook IIi, and are willing to test, please let me know!)
Thanks!
Thanks!
Who's Who?
I just received two e-mails (identical to each other) stating this:
I'm not entirely convinced this is a worthwhile thing... but I'm willing to play along until they ask me for money. Anyone else out there received these before?Dear DAmore Garrett,
The Heritage Registry of Who's Who is recognizing you for possible inclusion in the upcoming 2007-08 Edition. Please go to http://theheritageregistryofwhoswho.com and click on the invitation button.
Thank You,
Chris Jespersen
Friday, April 6, 2007
Tadpole project proposed
FYI, I recently proposed a new project to track improvements to support for Tadpole platforms in OpenSolaris. It looks like it got the seconds needed, so I'm just waiting for the infrastructure to be created.
first putback!
I just made my first putback to ON (6487387 pcic driver contains obsolete & private Tadpole code that should be removed).
While this is nothing earth shattering, hopefully I'll be making a lot more commits soon.
While this is nothing earth shattering, hopefully I'll be making a lot more commits soon.
Thursday, April 5, 2007
Inland Empire Solaris Users?
I've been wondering how many other OpenSolaris users there are out there in the Inland Empire. I recently met one close to me, which surprised me quite a bit. I figured I was the only one within at least a 30 mile radius.
If there are others of you out there, please drop me a line. I'd like to inquire as to whether it makes sense to consider starting a User's Group for the area. Possibly we could join up with any other User Groups for Southern California.
For the record I live in southwest Riverside county, not far from Temecula and Murrieta. (For those of you not familiar with the west coast, the Inland Empire refers to a large region of southern California that is separated from the coastal areas of Orange and Los Angeles counties by a range of coastal mountains. I often have joked that I'm about 65 miles from any natural technology center, but now I'm not so sure. And I think a lot of people who commute to places like San Diego and LA live out here.)
If there are others of you out there, please drop me a line. I'd like to inquire as to whether it makes sense to consider starting a User's Group for the area. Possibly we could join up with any other User Groups for Southern California.
For the record I live in southwest Riverside county, not far from Temecula and Murrieta. (For those of you not familiar with the west coast, the Inland Empire refers to a large region of southern California that is separated from the coastal areas of Orange and Los Angeles counties by a range of coastal mountains. I often have joked that I'm about 65 miles from any natural technology center, but now I'm not so sure. And I think a lot of people who commute to places like San Diego and LA live out here.)
Sunday, April 1, 2007
ancient history (IEN-116 must die!)
Funny note. When I came back to Sun (two weeks ago), I discovered that an ancient PSARC case (2002/356) for the removal of the Trivial Name Server (in.tnamed) had never been completed. So for 5-odd years since we've continued to ship this long-since-obsolete protocol. I'm going to go ahead and drive forward with the actual removal... at the time I did it as a case study in how much process was involved with even a simple EOF. Lets see how long this one takes. (For the record, the IEN-116 protocol was obsolete as far back as 1986, when J. Postel first requested vendors ditch it.)
afe and mxfe pending updates
Those of you using afe (and also mxfe) will be pleased to note that the time is fast approaching when afe will hopefully be integrated into Solaris Nevada. There is a PSARC fasttrack scheduled for it next week if I understand correctly. (I don't have the case number yet.)
There are a few ramifications of this. One of the most immediate is that I'm going to be winding down support for versions of Solaris earlier than 10. In fact, I no longer have any personal installations running anything less than S10u3, and most everything is running Nevada.
The other reason for me to do this is so that I can immediately start taking advantage of some features that are present in Solaris 10 and Nevada. For example, I want to add support for DLPI link notification, and ultimately (in Nevada) port to GLDv3.
The GLDv3 has some compelling features, and as a result afe and mxfe will gain support for features like vlans, jumbo frames, and interrupt blanking. And, they'll also benefit from the increased performance gains afforded by the GLDv3 framework.
It isn't clear to me that I'll be supporting GLDv3 for Solaris 10 (the interfaces are not yet public), but at least in Nevada I will. And even for S10, I'll probably be using new GLDv2 features that are not available to older releases. (Like the DLPI link notification.)
Before I do this, I will be spinning one last significant bug fix release for afe and mxfe, which addresses several significant bugs found by Sun's QA group. (Including the fact that afe has not functioned properly with multicast since it was first written!)
Watch the web page for more details.
There are a few ramifications of this. One of the most immediate is that I'm going to be winding down support for versions of Solaris earlier than 10. In fact, I no longer have any personal installations running anything less than S10u3, and most everything is running Nevada.
The other reason for me to do this is so that I can immediately start taking advantage of some features that are present in Solaris 10 and Nevada. For example, I want to add support for DLPI link notification, and ultimately (in Nevada) port to GLDv3.
The GLDv3 has some compelling features, and as a result afe and mxfe will gain support for features like vlans, jumbo frames, and interrupt blanking. And, they'll also benefit from the increased performance gains afforded by the GLDv3 framework.
It isn't clear to me that I'll be supporting GLDv3 for Solaris 10 (the interfaces are not yet public), but at least in Nevada I will. And even for S10, I'll probably be using new GLDv2 features that are not available to older releases. (Like the DLPI link notification.)
Before I do this, I will be spinning one last significant bug fix release for afe and mxfe, which addresses several significant bugs found by Sun's QA group. (Including the fact that afe has not functioned properly with multicast since it was first written!)
Watch the web page for more details.
Saturday, March 31, 2007
Not All GigE Are Equal
As a consequence of work I've been doing lately since I joined Sun, I've learned some things that folks who care a great deal about performance might like to know.
The most important of these is that not all gigabit cards are created equal. And even among those that are, some of them get preferential treatment at Sun.
One surprise: the gigE device that gets the most preferential treatment is not a Sun branded NIC. In fact, its a device that you can readily find at your local computer retailer.
I speak of bge.
The bge (Broadcom) NIC has some very, very sophisticated logic on it, that Crossbow is going to be able to take advantage of to get you some very nice performance acceleration, plus some greatly added support for QoS and stack virtualization. If you're thinking about a NIC, my first choice would be a Broadcom NIC.
The Cassini (Sun Gigaswift) has many of the same features, but costs more, and is harder to find. And, the Cassini isn't supported by some of the crossbow features -- yet. This issue will of course get resolved, but for the immediate now, your best bet is a broadcom NIC, especially if you want to run Solaris.
The other commodity NICs (RealTek 8169, Intel Pro/G, etc.) are certainly nice enough, but the features on these nics are an incremental update over similar 100 Mbit hardware, and don't hold a candle to the separate hardware rings, advanced classification engines, and similar features present in the broadcom and cassini hardware. (Noteably, these features will be more important with 10G NICs, and devices like Neptune -- Sun's 10G offering, will be featuring them prominently.) And finally, now that 10G and stack virtualization need these features, Solaris is going to start taking advantage of them. Some of this is already in Nevada, and more is on the way soon.
I wouldn't be surprised if other high-end NIC developers (Intel? Marvell?) start offering these features in future updates, although I expect some players (such as RealTek) will continue to focus on much simpler (and hence cheaper) devices.
The most important of these is that not all gigabit cards are created equal. And even among those that are, some of them get preferential treatment at Sun.
One surprise: the gigE device that gets the most preferential treatment is not a Sun branded NIC. In fact, its a device that you can readily find at your local computer retailer.
I speak of bge.
The bge (Broadcom) NIC has some very, very sophisticated logic on it, that Crossbow is going to be able to take advantage of to get you some very nice performance acceleration, plus some greatly added support for QoS and stack virtualization. If you're thinking about a NIC, my first choice would be a Broadcom NIC.
The Cassini (Sun Gigaswift) has many of the same features, but costs more, and is harder to find. And, the Cassini isn't supported by some of the crossbow features -- yet. This issue will of course get resolved, but for the immediate now, your best bet is a broadcom NIC, especially if you want to run Solaris.
The other commodity NICs (RealTek 8169, Intel Pro/G, etc.) are certainly nice enough, but the features on these nics are an incremental update over similar 100 Mbit hardware, and don't hold a candle to the separate hardware rings, advanced classification engines, and similar features present in the broadcom and cassini hardware. (Noteably, these features will be more important with 10G NICs, and devices like Neptune -- Sun's 10G offering, will be featuring them prominently.) And finally, now that 10G and stack virtualization need these features, Solaris is going to start taking advantage of them. Some of this is already in Nevada, and more is on the way soon.
I wouldn't be surprised if other high-end NIC developers (Intel? Marvell?) start offering these features in future updates, although I expect some players (such as RealTek) will continue to focus on much simpler (and hence cheaper) devices.
Tuesday, March 27, 2007
Congratulations new OGB
Its official!
The OpenSolaris Constitution has been ratified. Yay!
Congratulations to the new OGB as well. I'm generally very pleased with the election results, despite not getting elected myself.
A few interesting tid bits:
1) Rich Teer is the only non-Sun OGB member. (And apparently he has done some work for Sun.)
2) The entire OGB seems to be made up of engineers.
3) Neither female member was elected.
4) There are two ARC members sitting. (James Carlson and Alan Coopersmith.)
5) There seems to be good geographical representation... i.e. MPK (and SFBay in general) don't seem to be overweighted.
6) Several members have sat on other FOSS boards (at least Glynn and Alan)
I would have liked to see someone with more marketing and program management elected.
In future years, I'd like to see the process for Core Contributor grants revised. I think only folks who are active in the community should have this role ... I think a lot of people got grants just because they committed code as part of their day jobs at Sun. Also, there should be a limit to how many Core Contributors a given community can elect... the large number of user-group contributors could have had an adverse effect on results. I would also like to see 1) term limits, and 2) limits on the number of members working for (affiliated with) any one employer. (Not more than 3.) But those are ideas for the new OGB to think about.
Again, congratulations to the new OGB, and a BIG thank-you to everyone who voted (regardless of whether you voted for me or not.)
The OpenSolaris Constitution has been ratified. Yay!
Congratulations to the new OGB as well. I'm generally very pleased with the election results, despite not getting elected myself.
A few interesting tid bits:
1) Rich Teer is the only non-Sun OGB member. (And apparently he has done some work for Sun.)
2) The entire OGB seems to be made up of engineers.
3) Neither female member was elected.
4) There are two ARC members sitting. (James Carlson and Alan Coopersmith.)
5) There seems to be good geographical representation... i.e. MPK (and SFBay in general) don't seem to be overweighted.
6) Several members have sat on other FOSS boards (at least Glynn and Alan)
I would have liked to see someone with more marketing and program management elected.
In future years, I'd like to see the process for Core Contributor grants revised. I think only folks who are active in the community should have this role ... I think a lot of people got grants just because they committed code as part of their day jobs at Sun. Also, there should be a limit to how many Core Contributors a given community can elect... the large number of user-group contributors could have had an adverse effect on results. I would also like to see 1) term limits, and 2) limits on the number of members working for (affiliated with) any one employer. (Not more than 3.) But those are ideas for the new OGB to think about.
Again, congratulations to the new OGB, and a BIG thank-you to everyone who voted (regardless of whether you voted for me or not.)
Thursday, March 22, 2007
Just Voted
I just cast my ballot for the OGB and ratification of the constitution.
I will not tell you who I voted for, but I will say two things.
First off, the decision was hard. There are some excellent candidates running. I'm pretty confident that we are going to have a great OGB, made of up reasonable individuals who are passionate about OpenSolaris. (Yes, I did vote for myself, but I also voted for quite a few other people... there are 7 seats, after all.)
Second, I did vote to ratify the Draft Consitution. I hope you did, too.
The window of time to cast your ballot is quickly drawing to a close. Polls close on Monday, so be sure to cast your ballot before then. Unlike some others, I waited a bit, primarily because I wanted to hear what some of the other candidates had to say. So even if you have not voted yet, please do so today. Even if you abstain from voting on the candidates, at least make a statement on the ratification of the constitution. I believe it is important to get at least a 51% turnout. I believe there are around 260 eligible voters, so far only 85 ballots have been cast.
So, as Glynn says, think of the kittens and VOTE!
I will not tell you who I voted for, but I will say two things.
First off, the decision was hard. There are some excellent candidates running. I'm pretty confident that we are going to have a great OGB, made of up reasonable individuals who are passionate about OpenSolaris. (Yes, I did vote for myself, but I also voted for quite a few other people... there are 7 seats, after all.)
Second, I did vote to ratify the Draft Consitution. I hope you did, too.
The window of time to cast your ballot is quickly drawing to a close. Polls close on Monday, so be sure to cast your ballot before then. Unlike some others, I waited a bit, primarily because I wanted to hear what some of the other candidates had to say. So even if you have not voted yet, please do so today. Even if you abstain from voting on the candidates, at least make a statement on the ratification of the constitution. I believe it is important to get at least a 51% turnout. I believe there are around 260 eligible voters, so far only 85 ballots have been cast.
So, as Glynn says, think of the kittens and VOTE!
Tuesday, March 20, 2007
first day at Sun
Apparently, I am not the only one who started here at Sun yesterday. Auspicious? Wait and see...
Meantime, I spent yesterday meeting the group, and getting to know what I'll be working on. There's some exciting stuff going on, and while I can't really talk about it now, the good news is that eventually the good stuff will make into OpenSolaris.
Meantime, I spent yesterday meeting the group, and getting to know what I'll be working on. There's some exciting stuff going on, and while I can't really talk about it now, the good news is that eventually the good stuff will make into OpenSolaris.
Tuesday, March 13, 2007
OGB interview
Simon Phipps (aka webmink) interviewed me yesterday as part of the series of interviews he is giving to the OGB candidates. This is an excellent way to get some idea of each of the candidates; if you're a Core Contributor, I recommend checking his blog out.
Thursday, March 8, 2007
OGB Postition Statement
As you probably know, I'm running for a seat on the OpenSolaris Governing Board (OGB). I've answered a number of questions already, but I really like how Keith Wesolows has set up a position statement, so much so that I'm going to do the same. Much of my layout and presentation follows his, but rest assured that there are real meaty differences here. :-)
Before I do that, after reviewing a lot of the material that has been submitted by nominees, I do think there is a really good group of candidates who are as passionate and level-headed as I am running... its likely that I'll be happy with the board that is elected regardless of whether I'm on it or not. Now, on to the details... I'll start with my positions on things, along with a short bio at the end.
Now, all that being said, there are probably questions I forgot to answer, or simply haven't thought of. You can ask me, or the whole group of candidates at board-candidates-2007@opensolaris.org. I'll post any of my replies here.
About Me (Bio)
I'm a 35 year old software engineer from southern California, where I currently reside. (I telecommute.) I have a BS in Computer Science from San Diego State University (class of '95). My areas of expertise are kernel, device drivers, embedded systems, networking, and security. I've hacked on Linux, NetBSD, and Solaris, as well as some proprietary kernels (including the Sun Ray firmware). Far and away I prefer Solaris -- I could go on and on about its benefits... but you already know them, right?
I've worked as a UNIX systems administrator at Qualcomm, as a software engineer (systems software and device drivers) at Sun, and as a kernel, networking, and thin-client engineer at Tadpole Computer (now General Dynamics.) I'm expecting to start a new job as a Contractor at Sun working on Solaris starting March 19, 2007. (My whole resume is online if you really want to know the details.)
I also have a loving wife, and am blessed with two girls and a boy (ages 5, 7, and 6). We also have two cats.
When I'm not in front of a computer, or spending time at home with the family, I enjoy white water kayaking (including ocean surfing in playboats), sailing, skiing, swimming and reading.
Before I do that, after reviewing a lot of the material that has been submitted by nominees, I do think there is a really good group of candidates who are as passionate and level-headed as I am running... its likely that I'll be happy with the board that is elected regardless of whether I'm on it or not. Now, on to the details... I'll start with my positions on things, along with a short bio at the end.
- The Constitution
YES.
The document isn't perfect, but it is an excellent start, and we can address issues with it as we move along. Without it, the OGB cannot exist, and the project will not be able to exist as a legitimate entity independent of Sun.
- Community Structure
We can do better.
The biggest issues I have with the way communities and projects are laid out now, is that it seems very ad-hoc. However, I'm not sure that we should throw the baby out with the bath water. I think there is a need for communities to exist without specific code bases, and many projects may have efforts which span otherwise logical code boundaries (e.g.. work in both the ON consolidation and the Gnome consolidation.)
Simply providing better guidance on the creation of groups, the conditions under which a group should be terminated, and clarifying how projects are created and terminated can be done.
I do believe that every Project should have a Group that "owns" it, and that some of the way that we have created Groups (using DTrace as an example) should really have been projects under some umbrella Project.
- Change control, SCM, and ARCs
Open it up!
Historically "significant" changes to Solaris have had to go through ARCs. (Architectural Review Committees.) As painful as the process can often seem to be, I think that the end result has been the creation of the finest operating system on the planet. So I don't want to muck with it too much.
However, I believe that if OpenSolaris is going to really be Open, then the review process for it needs to be open, and that these ARCs need to include representation from stakeholders in the community. And I think there can be some minor tweaking to help the community in dealing with this process. I would consider this the first step that OGB should be pushing Sun for, simply because it does not require any change in the technology or tools to make it happen right now.
Ultimately, the community needs to have the ability to access the code directly, and there needs to be some powers that are delegated to the community, such as RTI advocacy. More on that later.
- Licensing
No simple answer.
This seems to be a hot-bed of controversy. Let me state some things unequivocally. First off, I do not believe that changing licensing is at this time likely to significantly improve mindshare or marketshare of OpenSolaris. Second, I do believe there is a real risk of a license-driven fork if the "core" of OpenSolaris is dual-licensed. Third, for various reasons, I believe that wholesale relicensing of OpenSolaris to GPL (either v2 or the draft v3 that I've seen) is bad idea that will ultimately severely harm a number of commercial stakeholders (among them Sun, but not just Sun.)
However, I'm not a license zealot, and I believe that there may be cases where dual licensing to enable certain components of OpenSolaris to be used in foreign projects may ultimately be in the best interests of the project. If the goal of a project is to have as many people using a certain technology, then it is also in the best interest to eliminate the barriers to use. For example, the project may decide that Linux adoption of DTrace is something that has concrete benefits. In this case, those portions could be dual licensed, on a case by case basis.
One important stumbling block, however, is the concern about enhancements to the code from an external project under a license other than CDDL. Ultimately I would prefer to see such contributions able to come back to Open Solaris under CDDL. I'm not sure how best to achieve this, though; and if the concern arises, it may be a good idea to get some legal assistance with the matter.
- On Commercial Use and Intellectual Property
I make my living writing software.
Furthermore, I've worked at Sun in the past, expect to start working their again in about two weeks, and have most recently worked at a significant Sun licensee and technical partner.
I would like to see Open Solaris continue to encourage commercial endeavors based on it. We should continue to have a friendly relationship with Sun and other companies (such as my soon-to-be-previous employer) and encourage their participation.
I also believe that there is a case for software patents, but that the patent process is badly busted and needs repair. You won't find me at a protest against patents or against companies that make money selling software, but you also won't find any content I don't have a legal right to have on any of my systems. But you will find me contributing time, energy, and actual code to Open Source projects I believe in. Right now Open Solaris is the first and foremost of those.
- The Role of OGB
Strong non-technical leadership.
I believe OGB has a limited number of tasks to accomplish. First, I think OGB can help provide non-technical leadership for the Open Solaris umbrella. By that, I mean that OGB can help direct what markets the project should be pursuing, how we image ourselves, arrangement of constitutional matters (such as the election itself), etc. Second, OGB can help act as a liason between the community (and projects and groups within it), and other 3rd parties, including Sun. Third, this first OGB has, IMO, an important goal of getting some of the infrastructure and political issues worked out (perhaps by creation of a separate group) to finish making the project independent of Sun. Finally, it can act as the final arbiter in cases where an issue cannot be resolved at a lower level (such as at a project or group level.)
I do not believe that OGB should be providing any technical direction whatsoever, although some of its members will undoubtedly do so as a result of their other roles. Indeed, apart from the fairly large set of tasks the initial OGB has to perform to address organizational and administrative shortcomings, I do not expect the OGB to be very involved in the day to day running of the project.
- The role of Sun
A contributor, perhaps the most important contributor.
(Disclaimer: I will shortly be working for Sun -- indirectly, but they will be paying my salary nonetheless. I am running as an individual however, because I'm passionate about Solaris. I just also happen to be going to work for Sun also because I'm passionate about Solaris. Well, that, and the nice salary...)
Sun is a major contributor, and the copyright holder for nearly all of the source we work with. However, by creation of the Open Solaris project, Sun has made it clear that it wants the project to stand on its own.
There are various technical, administrative, and political barriers still standing in the way. None of them are insurmountable. I think OGB should be actively trying to drive solutions to these problems (perhaps not the actual solutions themselves, but make sure that the right people are working on them, and track the progress.)
- Technical Matters
Yes, I'm an engineer.
I think I've already stated that OGB shouldn't be involved in the technical direction of Open Solaris. However, a few items that have been asked already.
Binary compatibility. Much of the appeal of Solaris has been the fact that it has been stable from release to release. I would be hard pressed to find a case for breaking this. Stability levels and standards like the DDI have been a key value proposition for Solaris in the past, and will continue to be so if we don't break it.
Quality first. The various processes that Sun has had in place for Solaris for over a decade now have done a lot to ensure that Solaris is the high quality product it is. Change for its own sake is bad. Peer review is good. Oversight review by a strong (genius!) technical leadership (ARC) is even better. I don't want to change this culture. I do want to figure out how to bring non-Sun parties to the table though, and involve the whole community. Its going to be tough for some people who are used to "commit first, fix it later, and hope that someone else will document it" policies in certain other open source projects. But we'll all be better for it.
Now, all that being said, there are probably questions I forgot to answer, or simply haven't thought of. You can ask me, or the whole group of candidates at board-candidates-2007@opensolaris.org. I'll post any of my replies here.
About Me (Bio)
I'm a 35 year old software engineer from southern California, where I currently reside. (I telecommute.) I have a BS in Computer Science from San Diego State University (class of '95). My areas of expertise are kernel, device drivers, embedded systems, networking, and security. I've hacked on Linux, NetBSD, and Solaris, as well as some proprietary kernels (including the Sun Ray firmware). Far and away I prefer Solaris -- I could go on and on about its benefits... but you already know them, right?
I've worked as a UNIX systems administrator at Qualcomm, as a software engineer (systems software and device drivers) at Sun, and as a kernel, networking, and thin-client engineer at Tadpole Computer (now General Dynamics.) I'm expecting to start a new job as a Contractor at Sun working on Solaris starting March 19, 2007. (My whole resume is online if you really want to know the details.)
I also have a loving wife, and am blessed with two girls and a boy (ages 5, 7, and 6). We also have two cats.
When I'm not in front of a computer, or spending time at home with the family, I enjoy white water kayaking (including ocean surfing in playboats), sailing, skiing, swimming and reading.
New blog
This is my new blog, replacing the short-lived Celestial Bodies, Amphibians, and Reptiles blog I set up while an employee at General Dynamics (formerly Tadpole Computer.)
There will be more here in short bit, particularly an update on my position for the OpenSolaris Governing Board (OGB).
There will be more here in short bit, particularly an update on my position for the OpenSolaris Governing Board (OGB).
Subscribe to:
Posts (Atom)