Friday, August 24, 2007

qfe GLDv3

As my first gift to the community since becoming a Sun employee, I've putback the conversion of QFE to the new hme common GLDv3 code. Now you can use your old QFE boards with IP instances, VLANs, whatever. Go wild. Hopefully the rtls conversion will get putback tonight as well... still waiting for my RTI advocate to approve it.

Tuesday, August 14, 2007

Stuck with an rtls? (Realtek 8139)

I've recently hacked up the Realtek driver (rtls) to support GLDv3. Its part of usr/closed right now (though I hope we can open source it!), so I can only share binaries.

Anyway, if you're stuck with this driver on your x86 system (because its on your motherboard, usually), and you want to try running a GLDv3 version of the driver, let me know.

The GLDv3 brings link aggregation support, VLAN support, and virtualization (IP instances) with it.

Of course the hardware is still somewhat crummy, so I wouldn't expect to get much performance out of it. But again, if you're stuck with it (as many people probably are) this may be helpful.

Monday, August 6, 2007

Dropping the "C"

For those not in the know, its now official. I'll be (re-)joining Sun as a regular full time employee starting August 20th. That means that I get to drop the "C" in front of my employee ID.

I'll be reporting to Neal Pollack, initially working on various Intel related Solaris projects.

Wednesday, August 1, 2007

hme checksum limitations

(This blog is as much for the benefit for other FOSS developers as it is for OpenSolaris.)

Please have a look at 6587116, which points out a hardware limitation in the hme chipset. I've found that at least NetBSD, and probably also Linux, suffer in that they expect the chip to support hardware checksum offload. However, if the packet is less than 64-bytes (not including FCS), the hardware IP checksum engine will fail. This means all packets that get padded, and even some that are otherwise legal (not needing padding) will not be checksummed properly.

For these packets, software checksum must be used.

partial checksum bug

As a result of investigation of a fix for 6587116 (a bug in HME, more later), we have found a gaping bug in the implementation of UDP checksums on Solaris.

Most particularly, it appears that UDP hardware checksum offload is broken for the cases where the checksum calculation will result in a 16-bit value of 0. Most protocols (TCP, ICMP, etc.) specify that the value 0 be used for the checksum in this case.

UDP, however, specifies that the value 0xffff be substituted for 0. Why ? Because 0 is given special meaning. In IPv4 networks, it means that transmitter did not bother to include a checksum. In IPv6, the checksum is mandatory, and RFC 2460 says that when the receiver sees a packet with a zero checksum it should be discarded.

The problem is, the hardware commonly in use on Sun SPARC systems (hme, eri, ge, and probably also ce and nxge) does not have support for this particular semantic. Furthermore, we have no way to know, in the current spec, if this semantic should be applied (short of directly parsing the packet, which presents its own challenges and hits to performance).

We'll have to figure out how to deal with this particular problem, sometime soonish. My guess is that all Sun NICs will lose IP checksum acceleration (transmit side only) for UDP datagrams, and that those 3rd party products which can do something different will need another flag bit indicating UDP semantics.

Friday, July 27, 2007

nxge and IP forwarding

You may or may not be aware of project Sitara. One of the goals of project Sitara is to fix the handling of small packets.

I have achieved a milestone... using a hacked version of the nxge driver (diffs available on request), I've been able to get UDP forwarding rates as high as 1.3M packets per sec (unidirectional) across a single pair of nxge ports, using Sun's next sun4v processor. (That's number of packets forwarded...) This is very close to line rate for a 1G line. I'm hoping that future enhancements will get us to significantly more than that... maybe as much as 2-3 Mpps per port. Taken as an aggregate, I expect this class of hardware to be able to forward up to 8Mpps. (Some Sun internal numbers using a microkernel are much higher than that... but then you'd lose all the nice features that the Solaris TCP/IP stack has.)

By the way, its likely that these results are directly applicable to applications like Asterisk (VoIP), where small UDP packets are heavily used. Hopefully we'll have a putback of the necessary tweaks before too long.

mpt SAS support on NetBSD

FYI, NetBSD has just got support for the LSI SAS controllers, such as that found on the Sun X4200. My patch to fix this was committed last night. (The work was a side project funded by TELES AG.)

Of course we'd much rather everyone ran Solaris on these machines, but if you need NetBSD for some reason, it works now.

Pullups to NetBSD 3 and 3.1 should be forthcoming.