Wednesday, June 20, 2007

The Need For Public Test Suites

Now that OpenSolaris is supposed to be "Open", the community needs a way to perform quality assurance tests so that community contributions do not block on Sun QA.

Currently, putback of changes to code to Solaris requires QA validation. For example, in order to putback my updated iprb and hme drivers (or my new mxfe driver), I have to get QA coverage. This means that I also have to get the time of someone from Sun, which can be challenging.

In order to free the community from Sun's grip, we have to have alternatives, so that community members can perform testing; giving the necessary quality assurance needed for (Open)Solaris, without blocking progress.

Hopefully someday soon the efforts of the folks who own the test suites to open them up will address this problem. For now, we just have to wait...

Monday, June 18, 2007

On life, the Universe, and everything...

Well, maybe not so much the Universe, as our own galaxy....

I recently came across a statement that there was an estimated 100 billion stars in our galaxy. I started to wonder, about the odds of us encountering sentient life in the galaxy, so I started running some rough calculations, just to estimate.

Astronomers estimate that approximately three out of four stars may harbor planets. (Basically, any unary system, plus any binary system where the companions orbit at least as far from one another as Pluto orbits our own star.) Again, these are rough estimates. So, maybe 75 billion planetary systems exist in our own galaxy!

For the moment, lets call the probability of a planetary system harboring a planet capable of supporting life is p. Lets call the probability of life developing on such a planet l. Lets call the probably of sentient life developing from more primitive forms (at any point, without regard to the time it takes) s. Further, lets assume that the average age of all stars in our galaxy is close to 5 billion years. And, lets assume that the time it takes for sentient (not necessarily civilized!) life to develop is close to what it took here on Earth, and that a sentient life form remains on the planet for about 100,000 years. (This is similar to the span of time that has been postulated since the first cavemen appeared here on earth.)

Then, we can guess that the number of planets which currently harbor sentient life in our galaxy to be expressed by:

75 * 10^9 * p * l * s * 100,000 / (5 * 10^9)

Simplifying terms:

1.5 * 10^6 * p * l * s

As probabilities p, l and s approach unity, we have approximately 1.5 million sentient species in the galaxy right now! (Regardless of whether they are space-faring or not.) This also ignores galaxies other than our own (there may be 100 billion such galaxies!)

Assume 1 per cent for each of these probabilities, and an entirely different picture comes up:

1.5 * .01 * .01 * .01 = 1.5 species in our galaxy right now.

The real question is, what are the values of p, l and s. Well, lets look at them:

Looking at our own Solar system, we have 8, 9, or more planets, and a number of moons. At least one of them supports life (Earth!) Its likely that at some point in its history, the conditions for supporting life were present on Mars, and it is also possible that conditions for supporting life may exist elsewhere just in our own system. (We are considering various moons, for example.) So it does not seem unrealistic to hypothesize a fairly large value for p. Let's randomly pick a value of .75.

The probability of l still seems a bit unclear. I certainly hope that we find it to be fairly large, but observationally we have not got any information other than our own planet. A sample size of 1 is too small to tell us anything. But, if we assume that other places in the galaxy are somewhat likely to have undergone similar processes as our planet did, the probably may be at least .25. (Again, a semi random value.)

With those values, we should have 1.5 million * .75 * .25 = 281,250 planets harboring life (of any kind) in our galaxy.

What is the value of s? Well, that's the biggest question. But even if it is quite small, say .01, then we still have a nice value of around 2000 different sentient species in our galaxy right now!

I'm really hopeful that now that we are getting the instrumentation to locate these extra solar planets, and even tell some things about them, such as their chemical makeups, etc. that we may be able to start finding evidence of these ... I hope that in the coming decade or two we will start sending out the first extra-solar probes to start are more direct observation of some of the other planetary systems. (We know that it costs quite a bit to build such a probe. But what is the incremental cost for additional probes? Could we send out 100, 1000, or even 10000 such probes to different systems?

Thursday, June 7, 2007

eri GLDv3 and nemo fixes putback

In build 67, you'll find that eri(7D) is now a Nemo driver, with full support for IP instances, VLANs, trunking, etc.

As a consequence, you may have to fix scripts that do ndd /dev/eri since they now need to use /dev/eri0.

Nemo driver developers: you no longer need to syslog link status changes. In fact, please don't, because Nemo does it for you now.

Next up, hme, and (surprise!) iprb. (iprb was done last night on a bet.. Steve owes me a beer.)

Saturday, June 2, 2007

Is GLDv3/nemo conversion of legacy drivers worthwhile

That very question, specifically with regard to hme and qfe, but also some others, has come up lately. I'm of one mind (I think my position is clear by the very fact that I've invested effort here), but not everyone shares my opinion.

In order to have an on-line concrete resource I can point internal Sun naysayers at, I'm asking you to voice your thoughts here, by posting a follow up to this blog. (Sorry, no anonymous posts, but that means your posts will carry all that much more weight.)

Do you still use hme/qfe in systems? What about Sun Trunking with qfe? Would you upgrade to Nevada if there was GLDv3 support for these NICs? Would Nemo features in qfe/hme help you? Would a port of hme/qfe to x86 be useful to you?

Please post a follow up on my blog here, with your opinions!

more network driver updates

I just completed my codereviews of eri, and got approval from the owners of the code to commit the changes. I expect that as a result the eri conversion to GLDv3 (plus major cleanups in the code) will be putback next week ... probably on Wednesday afternoon.

Why Wednesday? Well, I also need to commit PSARC 2007/298 and 2007/296, which the eri driver depends on. 2007/298 was the source of much debate lately, but I think a consensus has been achieved, and the changes should go in once I get the final blessing from PSARC (which at this point is pretty much a foregone conclusion.) Code reviews and testing have already been done.

I've also sent an mxfe card to Alan DuBoff, so he can run it through the NICDRV battery of tests. Hopefully as a result mxfe will be integrated soon. I'm anxious for Alan to commit afe (he has asked that I not take this over from him), so that I can quickly convert it to GLDv3 as well.

For some of the other legacy NICs (iprb, rtls, etc.) I've been asked to provide information about GLDv2 to v3 conversions, because apparently she wants to try her hand at converting at least one of them. Since these NICs are still common on PC motherboards, I applaud this effort.

As far as hme/qfe/ce go, more on that in a follow up.

Friday, May 25, 2007

GLDv2->GLDv3 conversion notes

I was recently asked to provide some notes about GLDv2 to GLDv3 conversion for NIC drivers. Here's a rough draft of them. (This is cut-n-paste from mail I sent to an intern at Sun... I'm posting them here so the knowledge isn't lost.)

It is really helpful if you don't try to implement 100% of the features of GLDv3 in the first pass. (Some of the existing GLDv3 drivers, such as rge, nge, have incorrectly provided stubs for some functions, so don't use those as references.) Specifically I would not attempt to implement mac_resource allocation (MC_RESOURCES, etc.) or multiaddress support.

You really do need to implement VLAN full frame sizes for MTU if your hardware can do it. Almost all NICs can do this. Sometimes the code to do it isn't in any Solaris driver. My preferred reference for alternate code sources is the NetBSD tree, which has an OpenGrok server for their code at http://opengrok.netbsd.org/ Ask me if you have any question about VLANs. It helps if you have a switch where you can test VLAN frames.

With GLDv3, there is no reset function, so you have to figure out how to put that in attach (for both DDI_ATTACH and DDI_RESUME!), or in the mac_start() function.

With GLDv3 the stats are quite different. Pay attention, and look at the headers to figure it out.

Many GLDv2 drivers don't do the mac_link_update() call. You should add those for NWAM, IPMP, and correct kstat reporting.

You should rip out any attempt to log to the console on link up, down, or carrier errors. See PSARC 2007/298 for details.

GLDv3 wants to operate on mblk_t's that are chained by b_next. Often you can use the old functions from a new function that just walks down, or builds up the list (depending on whether its receive or transmit.)

Pay careful attention to locking. Try not to call GLDv3 functions with locks held. (I have been pressing to allow mac_tx_update and mac_link_update be called with driver locks held. Right now its safe, but I can't seem to get a promise from the Nemo group yet.)

You see all those cyclics in some drivers? Try not to use 'em if you don't have to. What I usually do is cheat and use the on-chip timer if I need some kind of time driven functionality.

GLDv3 never explicitly initializes the physical addresses on the NIC. GLDv2 used to always call gldm_set_mac_addr()... so some drivers expect this. You may need to do that yourself in the mac_start() routine.
Anyway, maybe those notes will save someone somewhere else some effort. Or inspire someone to pick up another driver and convert it.

All these nics...

So I need to JumpStart a new system today... no problem, I'll just stick in a NIC and boot it with my etherboot PXE CDROM. No problem, right?

Well, lets see, first I need a NIC that supports Solaris. Inventorying what I have in my spare hardware today:
  • Netgear GA311, rev A1 (RealTek 8169S-32, unsupported variant of rge)
  • Netgear FA311, rev C1 (Nat-Semi DP83815D, unsupported)
  • Netgear FA310TX, rev-D2 (Lite-On LC82C169, unsupported, see below)
  • 3Com 3CR990 TX-97 (unsupported)
  • D-Link 530TX rev A-1 (dmfe, no x86 support)
  • Zyxel gigE (Via GbE chip, uncertain)
  • Linksys LNE100TX v4.1 (unsupported, yet, see below)
  • Linksys NC100 (unsupported, yet, see below)
  • Macronix MX98715AEC (unsupported, yet, see below)
  • Unbranded RTL8139B (supported, rtls, nevada only)
  • 3Com 3C900-TX (supported, elxl, for now)
Well, at least I was able to find something. Of my 8 spare NICs, two of them have marginal support. (This is only the wired ethernet NICs. I have some WLAN devices as well.)

I guess I have a habit of collecting NICs.

Now, the Linksys boards are going to soon be supported by afe, if Alan ever gets his putback of my driver done. The Macronix board will be supported by mxfe later this week, once I get it reviewed and putback.

At one point I had a driver (pnic) sort of working for the LC82C169 (Lite-On PNIC), but I abandoned it because the PNIC was such a piece of crap, that I figured anyone with one of these was better off throwing it away and replacing it with another NIC (as long as it wasn't a Realtek 8139!) Maybe I'll revive that project one day. Probably not, since Lite-On didn't sell too many of them, I think. (The PNIC has some horrible hardware bugs, and the two major revisions, the 82C169 and 82C168, have quite different methods of handling 802.3u autonegotiation.)

I also started a driver for the Nat-Semi chip (nsfe), but abandoned it. I think this chip is also found in motherboards, where it is called an SiS part. I think Muryama also has a driver available for it.

I'd really like to see support for the others expanded upon. Maybe I need to look at dmfe, some more, because there really shouldn't be any reason it couldn't support x86 platforms. (D-Link sold a lot of DFE-530TX boads, IIRC.)

This also suggests that the elxl driver, which has been slated for EOF, really shouldn't be. One of the reasons I've kept that old NIC around was just because it was one of the few that was supported by Solaris 8 and earlier. I suspect I'm not the only one to have done this. I think the problem is that this driver is not open source. But open source variants exist... maybe someone should look at replacing elxl in Solaris Nevada with a FOSS replacment.

Some of these Muryama has already written drivers for. I would dearly like to see his vel in Solaris Nevada, along with conversion to GLDv3.