Friday, October 19, 2007

Sun Opening up more...

So I recently was informed that Sr. management has directed my team to move some of the engineering work that we have been doing "internally" to the OpenSolaris web site. This is more than just moving internal web pages outside the firewall, though. This is about starting to do the actual engineering work in the open.

The first two projects that my group is going to be doing this with are the laptop suspend/resume effort and the SD card stack. (The SD card stack needs to go through some licensing approval first, as the SDA organization doesn't allow for redistribution without a license. The "open" specs are "Evaluation Only" apparently.

Anyway, this is a sign that the practices already being done elsewhere in the company (cf. the networking group) are starting to take hold elsewhere, even in demesnes that have historically been strongholds of the NDA.

Watch the laptop project page at os.o over the next week or so to see what we put up there... and there will be mailing lists for the project engineering as well!

Tuesday, October 9, 2007

Backyard gallery

I recently checked our pool contractor's website, and found my backyard in his gallery. It looks pretty cool on his website. Check it out here.

Thursday, October 4, 2007

dmfe putback done

I just fixed a bunch of dmfe changes... most notably, the driver will now support add-in PCI cards. Look for it in snv76.

Wednesday, September 26, 2007

SecureDigital, and other memory formats

So I've been tasked with writing a driver for the SecureDigital controller found on certain laptops. As part of this effort, I'd like to take a straw poll. If you could e-mail or post a comment on this blog, indicating your response, I'd be grateful. Note that this is only for folks with card readers that are *not* USB connected. (USB readers are already covered by USB mass storage.)

a) how many SDcard slots do you have?
b) do you use older MMC media?
c) do you have any SDIO peripherals? (I.e. devices other than memory cards.)
d) do you have slots other than SDcard (such as xD or memstick) that are important to you? Which ones? (Again, not for USB connected media readers!)
e) I'm interested in prtconf -vp output for different readers. If you're game, send it to me, along with the make and model of your laptop.

Thanks!

dmfe for x86 and unbundled PCI nics on SPARC

I've got a dmfe driver working on both SPARC and x86, and it supports some unbundled PCI NICs. (The original driver only worked with onboard dm9102s on certain SPARC hardware.) The Davicom 9102 part is not terribly common, but some low end NICs sold by companies like C/Net and Buffalo use it.

Hopefully this will be putback into snv75.

oops, qfe is b74

I lied... (not intentionally, I got confused.) The qfe GLDv3 port is in b74, not b73. Sorry!

Wednesday, September 5, 2007

snv_73 goodness

Solaris Nevada b73, when it comes out, is going to have a lot of good NIC stuff in it.

* afe driver (putback yesterday)
* mxfe driver
* rtls on SPARC (including suspend/resume!)
* qfe GLDv3 support

Plus, there are a lot of good networking improvements; a lot of stale code was removed (defunct mobile IPv4 support, detangling NAT, etc.)

There's a bunch of carryover from snv_70-72 too...

I for one, can hardly wait.

Thursday, August 30, 2007

mxfe RTI....

FYI,

I've submitted earlier today the RTI for mxfe. I expect afe (which will be more popular) will be later this week or early next. (We've fallen behind on some of the testing.)

I've also started looking at porting rtls to SPARC, and making it support SUSPEND/RESUME. More on that shortly.

Friday, August 24, 2007

rtls GLDv3

And now rtls is GLDv3. Not open source (yet), and no SPARC support, but hopefully those will both get fixed soon. Have fun!

qfe GLDv3

As my first gift to the community since becoming a Sun employee, I've putback the conversion of QFE to the new hme common GLDv3 code. Now you can use your old QFE boards with IP instances, VLANs, whatever. Go wild. Hopefully the rtls conversion will get putback tonight as well... still waiting for my RTI advocate to approve it.

Tuesday, August 14, 2007

Stuck with an rtls? (Realtek 8139)

I've recently hacked up the Realtek driver (rtls) to support GLDv3. Its part of usr/closed right now (though I hope we can open source it!), so I can only share binaries.

Anyway, if you're stuck with this driver on your x86 system (because its on your motherboard, usually), and you want to try running a GLDv3 version of the driver, let me know.

The GLDv3 brings link aggregation support, VLAN support, and virtualization (IP instances) with it.

Of course the hardware is still somewhat crummy, so I wouldn't expect to get much performance out of it. But again, if you're stuck with it (as many people probably are) this may be helpful.

Monday, August 6, 2007

Dropping the "C"

For those not in the know, its now official. I'll be (re-)joining Sun as a regular full time employee starting August 20th. That means that I get to drop the "C" in front of my employee ID.

I'll be reporting to Neal Pollack, initially working on various Intel related Solaris projects.

Wednesday, August 1, 2007

hme checksum limitations

(This blog is as much for the benefit for other FOSS developers as it is for OpenSolaris.)

Please have a look at 6587116, which points out a hardware limitation in the hme chipset. I've found that at least NetBSD, and probably also Linux, suffer in that they expect the chip to support hardware checksum offload. However, if the packet is less than 64-bytes (not including FCS), the hardware IP checksum engine will fail. This means all packets that get padded, and even some that are otherwise legal (not needing padding) will not be checksummed properly.

For these packets, software checksum must be used.

partial checksum bug

As a result of investigation of a fix for 6587116 (a bug in HME, more later), we have found a gaping bug in the implementation of UDP checksums on Solaris.

Most particularly, it appears that UDP hardware checksum offload is broken for the cases where the checksum calculation will result in a 16-bit value of 0. Most protocols (TCP, ICMP, etc.) specify that the value 0 be used for the checksum in this case.

UDP, however, specifies that the value 0xffff be substituted for 0. Why ? Because 0 is given special meaning. In IPv4 networks, it means that transmitter did not bother to include a checksum. In IPv6, the checksum is mandatory, and RFC 2460 says that when the receiver sees a packet with a zero checksum it should be discarded.

The problem is, the hardware commonly in use on Sun SPARC systems (hme, eri, ge, and probably also ce and nxge) does not have support for this particular semantic. Furthermore, we have no way to know, in the current spec, if this semantic should be applied (short of directly parsing the packet, which presents its own challenges and hits to performance).

We'll have to figure out how to deal with this particular problem, sometime soonish. My guess is that all Sun NICs will lose IP checksum acceleration (transmit side only) for UDP datagrams, and that those 3rd party products which can do something different will need another flag bit indicating UDP semantics.

Friday, July 27, 2007

nxge and IP forwarding

You may or may not be aware of project Sitara. One of the goals of project Sitara is to fix the handling of small packets.

I have achieved a milestone... using a hacked version of the nxge driver (diffs available on request), I've been able to get UDP forwarding rates as high as 1.3M packets per sec (unidirectional) across a single pair of nxge ports, using Sun's next sun4v processor. (That's number of packets forwarded...) This is very close to line rate for a 1G line. I'm hoping that future enhancements will get us to significantly more than that... maybe as much as 2-3 Mpps per port. Taken as an aggregate, I expect this class of hardware to be able to forward up to 8Mpps. (Some Sun internal numbers using a microkernel are much higher than that... but then you'd lose all the nice features that the Solaris TCP/IP stack has.)

By the way, its likely that these results are directly applicable to applications like Asterisk (VoIP), where small UDP packets are heavily used. Hopefully we'll have a putback of the necessary tweaks before too long.

mpt SAS support on NetBSD

FYI, NetBSD has just got support for the LSI SAS controllers, such as that found on the Sun X4200. My patch to fix this was committed last night. (The work was a side project funded by TELES AG.)

Of course we'd much rather everyone ran Solaris on these machines, but if you need NetBSD for some reason, it works now.

Pullups to NetBSD 3 and 3.1 should be forthcoming.

Wednesday, July 25, 2007

hme GLDv3 versus qfe DLPI

So, the NICs group recently told me I should have started with qfe instead of hme, because qfe has some performance fixes. (Such as hardware checksum, which I added to hme!) To find out out if this holds water, I ran some tests, on my 360 MHz UltraSPARC-IIi system, using a PCI qfe card. (You can make hme bind to the qfe ports by doing

# rem_drv qfe
# update-drv -a -i '"SUNW,qfe"' hme

(This by the way is a nice hack to use GLDv3 features with your qfe cards today if you cannot wait for an official GLDv3 qfe port.)

Anyway, here's what I found out, using my hacked ttcp utility. Note that the times reported are "sys" times.

QFE/DLPI

MTU = 100, -n 2048
Tx: 18.3 Mbps, 7.0s (98%)
Rx: 5.7 Mbps, 2.4s (10%)

MTU = 1500, -n = 20480
Tx (v4): 92.1 Mbps, 1.1s (8%)
Rx (v4): 92.2 Mbps, 1.6s (12%)
Tx (v6): 91.2 Mbps, 1.1s (8%)
Rx (v6): 90.9 Mbps, 2.6s (22%

UDPv4 tx, 1500 (-n 20480) 90.5 Mbps, 1.6 (64%)
UDPv4 tx, 128 (-n 204800) 34.2 Mbps, 5.2 (99%)
UDPv4 tx, 64 (-n 204800) 17.4 Mbps, 5.1 (99%)

And here are the numbers for hme with GLDv3

HME GLDv3

MTU = 100, -n 2048
Tx: 16.0 Mbps, 7.6s (93%)
Rx: 11.6 Mbps, 1.8s (16%)

MTU = 1500, -n = 20480
Tx (v4): 92.1 Mbps, 1.2s (8%)
Rx (v4): 92.2 Mbps, 3.2s (24%)
Tx (v6): 90.8 Mbps, 0.8 (6%)
Rx (v6): 91.2 Mbps, 4.0s (29%)

UDPv4 tx, 1500 (-n 20480) 89.7 Mbps, 1.5s (60%)
UDPv4 tx, 128 (-n 204800) 29.4 Mbps, 6.0s (99%)
UDPv4 tx, 64 (-n 204800) 14.8 Mbps, 6.0s (99%)

So, given these numbers, it appears that either QFE is more efficient (which is possible, but I'm slightly skeptical) or the cost of the extra overhead of some of the GLDv3 support is hurting us. I'm more inclined to believe this. (For example, we have to check to see if the packet is a VLAN tagged packet... those features don't come for free... :-)

What is really interesting, is that the hme GLDv3 work was about 3% better than the old DLPI hme. So clearly there has been more effort invested into qfe.

Interestingly enough, the performance for Rx tiny packets with GLDv3 is better. I am starting to wonder if there is a difference in the bcopy/dvma thresholds.

So one of the questions that C-Team has to answer is, how important are these relatively minor differences in performance. On a faster machine, you'd be unlikely to notice at all. If this performance becomes a gating factor, I might find it difficult to putback the qfe GLDv3 conversion.

To be completely honest, tracking down the 1-2% difference in performance may not be worthwhile. I'd far rather work on fixing 1-2% gains in the stack than worry about how a certain legacy driver performs.

What are your thoughts? Let me know!

Tuesday, July 17, 2007

afe and mxfe status notes

For those of you that care, I think we're in the home stretch for integration of afe and mxfe into Nevada.

I spent the weekend going through the code, and reworking large portions of it to make use of zero-copy DMA wherever it was rational to do so (loan up for receive, direct binding for transmit).

The NICDRV test suite has also identified a number of issues with edge cases that didn't come up often, but which I'm glad to know about and have fixed in the version of the code getting putback.

They're only 100 Mbps nics, but the version of the code going into Nevada will make them run at pretty much the same speed as any other 100 Mbps NIC without IP checksum offload.

And, they are still 100% DDI compliant. :-) Thankfully the DDI has been extended for OpenSolaris since the last time I worried about such things (back in Solaris 8 days).

Anyway, looking forward to putback in b70 or b71. (Depending on whether I can get reviewers in time for b70 putback. If you can help me review, please let me know!)

Thursday, July 12, 2007

HME putback done

In case anyone ever wondered what a putback message looked like:

*********  This mail is automatically generated  *******

Your putback for the following fix(es) is complete:

PSARC 2007/319 HME GLDv3 conversion
4891284 RFE to add debug kstat counter for promiscuous mode to hme driver
6345963 panic in hme
6554790 Race betweeen hmedetach and hmestat_kstat_update
6568532 hme should support GLDv3
6578294 hme does not support hardware checksum


These fixes will be in release:

snv_70

The gate's automated scripts will mark these bugs "8-Fix Available"
momentarily, and the gatekeeper will mark them "10-Fix Delivered"
as soon as the gate has been delivered to the WOS. You should not
need to update the bug status.

Your Friendly Gatekeepers

Btw, the case to make this work for qfe (PSARC 2007/ 404) was approved yesterday as well. There are some internal resourcing questions yet to be answered, but at least architectecturally, the approach has been approved.

I would really, really love it some qfe owners would file a bug asking for qfe to be GLDv3. It would make it much easier for me, I think, if this case were seen as a response to customer demand. (So many people have requested qfe GLDv3 support... please file a bug! Even better, file an *escalation*!)

Note: none of this eligible for backport to S10. You have to use OpenSolaris if you want the good stuff. Gotta keep a few carrots in reserve, right? (Seriously, ndd and Sun Trunking incompatibilities make it unsuitable for backport to S10 anyway.)

Sunday, July 8, 2007

hme GLDv3 and *hardware checksum*

So I've been trying to run my GLDv3 port of hme through a very rigorous battery of tests called "nicdrv" (the test suite used for recent NIC drivers by Sun QE... hopefully soon to be open sourced, but that's another topic.)

Anyway, the test system I've been using is a poor little 360MHz US-II Tadpole system. (A Darwin-workalike, in shoe-box formfactor.)

Unfortunately, the test times out while trying to do the UDP RX tests. Which really shouldn't be surprising... the test was designed for gigabit network gear, with gigahertz system processors (or better.)

Well, it turns out that the hme driver can be faster. Considerably faster. Because the hardware supports IP checksum offload. But it was never enabled. (Note that this is true for the quad-port qfe boards as well, which are basically the same controller behind a bridge chip.)

So, I've decided to have another go at getting a fully successful test result with this hardware. By modifying the driver to support IP checksum offload. I'm hoping it may make the difference between a pass and fail. With tiny frames, every little bit helps.

Stay tuned here. Note that I also have logged performance data from earlier test runs, so I'll be able to compare that as well. One additional wrinkle in all this, is that I now feel compelled to test this with Sbus hme hardware. The oldest system I can find is a Sun Ultra 2. (Older Ultra 1 systems with 200 MHz and slower procs won't work. If anyone has an old Ultra 1 with 250 or better procs running Nevada, let me know!)

Thursday, June 28, 2007

GLDv3 iprb putback

I just putback the GLDv3 conversion of iprb. It will be in the next SXDE/SXCE. (b69 and later). It is still closed source, but I think that may change soon, too. (All the technical information in the code is reproduced on a public open-source developers guide downloadable at Intel, with the exception of the binary microcode, which is in the FreeBSD tree under an Intel-owned BSD license.)

Anyway, I'm told Sun is having a meeting with Intel, and one of the agenda items is opening the source to iprb.

Meanwhile, enjoy the GLDv3 goodness.

Monday, June 25, 2007

afe GLDv3-ified

I've converted "afe" to GLDv3 in anticipation of it getting putback. I've also greatly simplified the buffering logic in it, because I was trying to be "too clever" and I think we were seeing failures during the extreme testing that Sun QA likes to perform.

Anyway, this means that when afe gets putback (its on a schedule for snv68, but that may or may not happen), it will be GLDv3. Yay. Here's something to whet your appetite:

garrett@doc{44}> pfexec dladm show-link
eri0 type: non-vlan mtu: 1500 device: eri0
afe0 type: non-vlan mtu: 1500 device: afe0

This was done on a Sun Blade 100. No more legacy nics!

This is also helpful for laptop owners, because afe is one of the more common cardbus devices. So, your cardbus 10/100 NIC will work with NWAM.

If folks running snv_66 or newer want test binaries, let me know. I can offer them up in exchange for beer.

Wednesday, June 20, 2007

mxfe code reviewers sought

I'm also looking for folks to review my mxfe driver. It is posted at http://cr.opensolaris.org/~gdamore/mxfe

Thanks!

hme code reviewers sought

I need to get code review coverage over the hme GLDv3 conversion. This is also a good chance to learn what it takes to convert a legacy DLPI driver to GLDv3.

If you can help out, please look at the code at http://cr.opensolaris.org/~gdamore/nemo-hme/

The sooner I can get quality code review and test coverage, the sooner we can put this back! :-)

The Need For Public Test Suites

Now that OpenSolaris is supposed to be "Open", the community needs a way to perform quality assurance tests so that community contributions do not block on Sun QA.

Currently, putback of changes to code to Solaris requires QA validation. For example, in order to putback my updated iprb and hme drivers (or my new mxfe driver), I have to get QA coverage. This means that I also have to get the time of someone from Sun, which can be challenging.

In order to free the community from Sun's grip, we have to have alternatives, so that community members can perform testing; giving the necessary quality assurance needed for (Open)Solaris, without blocking progress.

Hopefully someday soon the efforts of the folks who own the test suites to open them up will address this problem. For now, we just have to wait...

Monday, June 18, 2007

On life, the Universe, and everything...

Well, maybe not so much the Universe, as our own galaxy....

I recently came across a statement that there was an estimated 100 billion stars in our galaxy. I started to wonder, about the odds of us encountering sentient life in the galaxy, so I started running some rough calculations, just to estimate.

Astronomers estimate that approximately three out of four stars may harbor planets. (Basically, any unary system, plus any binary system where the companions orbit at least as far from one another as Pluto orbits our own star.) Again, these are rough estimates. So, maybe 75 billion planetary systems exist in our own galaxy!

For the moment, lets call the probability of a planetary system harboring a planet capable of supporting life is p. Lets call the probability of life developing on such a planet l. Lets call the probably of sentient life developing from more primitive forms (at any point, without regard to the time it takes) s. Further, lets assume that the average age of all stars in our galaxy is close to 5 billion years. And, lets assume that the time it takes for sentient (not necessarily civilized!) life to develop is close to what it took here on Earth, and that a sentient life form remains on the planet for about 100,000 years. (This is similar to the span of time that has been postulated since the first cavemen appeared here on earth.)

Then, we can guess that the number of planets which currently harbor sentient life in our galaxy to be expressed by:

75 * 10^9 * p * l * s * 100,000 / (5 * 10^9)

Simplifying terms:

1.5 * 10^6 * p * l * s

As probabilities p, l and s approach unity, we have approximately 1.5 million sentient species in the galaxy right now! (Regardless of whether they are space-faring or not.) This also ignores galaxies other than our own (there may be 100 billion such galaxies!)

Assume 1 per cent for each of these probabilities, and an entirely different picture comes up:

1.5 * .01 * .01 * .01 = 1.5 species in our galaxy right now.

The real question is, what are the values of p, l and s. Well, lets look at them:

Looking at our own Solar system, we have 8, 9, or more planets, and a number of moons. At least one of them supports life (Earth!) Its likely that at some point in its history, the conditions for supporting life were present on Mars, and it is also possible that conditions for supporting life may exist elsewhere just in our own system. (We are considering various moons, for example.) So it does not seem unrealistic to hypothesize a fairly large value for p. Let's randomly pick a value of .75.

The probability of l still seems a bit unclear. I certainly hope that we find it to be fairly large, but observationally we have not got any information other than our own planet. A sample size of 1 is too small to tell us anything. But, if we assume that other places in the galaxy are somewhat likely to have undergone similar processes as our planet did, the probably may be at least .25. (Again, a semi random value.)

With those values, we should have 1.5 million * .75 * .25 = 281,250 planets harboring life (of any kind) in our galaxy.

What is the value of s? Well, that's the biggest question. But even if it is quite small, say .01, then we still have a nice value of around 2000 different sentient species in our galaxy right now!

I'm really hopeful that now that we are getting the instrumentation to locate these extra solar planets, and even tell some things about them, such as their chemical makeups, etc. that we may be able to start finding evidence of these ... I hope that in the coming decade or two we will start sending out the first extra-solar probes to start are more direct observation of some of the other planetary systems. (We know that it costs quite a bit to build such a probe. But what is the incremental cost for additional probes? Could we send out 100, 1000, or even 10000 such probes to different systems?

Thursday, June 7, 2007

eri GLDv3 and nemo fixes putback

In build 67, you'll find that eri(7D) is now a Nemo driver, with full support for IP instances, VLANs, trunking, etc.

As a consequence, you may have to fix scripts that do ndd /dev/eri since they now need to use /dev/eri0.

Nemo driver developers: you no longer need to syslog link status changes. In fact, please don't, because Nemo does it for you now.

Next up, hme, and (surprise!) iprb. (iprb was done last night on a bet.. Steve owes me a beer.)

Saturday, June 2, 2007

Is GLDv3/nemo conversion of legacy drivers worthwhile

That very question, specifically with regard to hme and qfe, but also some others, has come up lately. I'm of one mind (I think my position is clear by the very fact that I've invested effort here), but not everyone shares my opinion.

In order to have an on-line concrete resource I can point internal Sun naysayers at, I'm asking you to voice your thoughts here, by posting a follow up to this blog. (Sorry, no anonymous posts, but that means your posts will carry all that much more weight.)

Do you still use hme/qfe in systems? What about Sun Trunking with qfe? Would you upgrade to Nevada if there was GLDv3 support for these NICs? Would Nemo features in qfe/hme help you? Would a port of hme/qfe to x86 be useful to you?

Please post a follow up on my blog here, with your opinions!