Sunday, July 8, 2007

hme GLDv3 and *hardware checksum*

So I've been trying to run my GLDv3 port of hme through a very rigorous battery of tests called "nicdrv" (the test suite used for recent NIC drivers by Sun QE... hopefully soon to be open sourced, but that's another topic.)

Anyway, the test system I've been using is a poor little 360MHz US-II Tadpole system. (A Darwin-workalike, in shoe-box formfactor.)

Unfortunately, the test times out while trying to do the UDP RX tests. Which really shouldn't be surprising... the test was designed for gigabit network gear, with gigahertz system processors (or better.)

Well, it turns out that the hme driver can be faster. Considerably faster. Because the hardware supports IP checksum offload. But it was never enabled. (Note that this is true for the quad-port qfe boards as well, which are basically the same controller behind a bridge chip.)

So, I've decided to have another go at getting a fully successful test result with this hardware. By modifying the driver to support IP checksum offload. I'm hoping it may make the difference between a pass and fail. With tiny frames, every little bit helps.

Stay tuned here. Note that I also have logged performance data from earlier test runs, so I'll be able to compare that as well. One additional wrinkle in all this, is that I now feel compelled to test this with Sbus hme hardware. The oldest system I can find is a Sun Ultra 2. (Older Ultra 1 systems with 200 MHz and slower procs won't work. If anyone has an old Ultra 1 with 250 or better procs running Nevada, let me know!)


bager said...

Excellent! I am very interested in results comparing DLPI,GLDv3 w/o hw checksum and GLDv3 with checksum on an Ultra SPARC II CPU based system.

Are there any other hardware features on hme which the old driver was not taking advantage of? What about Cassini? Does the existing ce driver make good use of all hardware offload cassini can do?

Garrett D'Amore said...

The hardware feature I'm not making full use of in hme GLDv3 (in my driver) is polling support. This is a potential performance assist for very overloaded low end hardware. On anything with more than a single processor, its unlikely to make a noticeable difference.

It appears that hme *may* support a receive MAC address filter that allows more than a single unicast address to be configured. But since I don't have documentation handy, its hard to be sure. (Its a guess from looking at the header files.) Such a feature could be useful with Crossbow.

I am taking advantage of the tunable frame support in hme, which allows hme to process full-mtu VLAN frames. Note that the support in hardware is not adequate to enable jumbo frames. (The max frame size supported by the hardware is 4K.)

You will find other GLDv3 features which aren't part of hardware support, but which are nice. 802.3ad link aggregation (sometimes called "trunking" or "teaming"), VLAN support, DLPI link notification and IP instance support is all there.

Btw, I also tuned the receive code path slightly. For dealing with large number of small packets, this may have a noticeable impact.

Notably lacking hardware features for hme/qfe are 802.3x and advanced interrupt mitigation. Both of these features were introduced with the RIO chip (aka GEM).

Cassini is a totally different beast. It has very good offload support in its driver, but when the Crossbow framework comes out, it won't be able to take full advantage of its hardware features. This could change in the future, but it will require a GLDv3 ce driver, which is a "skunkworks" project I've been working on, unsanctioned by the current Cassini development staff.