Friday, July 27, 2007

nxge and IP forwarding

You may or may not be aware of project Sitara. One of the goals of project Sitara is to fix the handling of small packets.

I have achieved a milestone... using a hacked version of the nxge driver (diffs available on request), I've been able to get UDP forwarding rates as high as 1.3M packets per sec (unidirectional) across a single pair of nxge ports, using Sun's next sun4v processor. (That's number of packets forwarded...) This is very close to line rate for a 1G line. I'm hoping that future enhancements will get us to significantly more than that... maybe as much as 2-3 Mpps per port. Taken as an aggregate, I expect this class of hardware to be able to forward up to 8Mpps. (Some Sun internal numbers using a microkernel are much higher than that... but then you'd lose all the nice features that the Solaris TCP/IP stack has.)

By the way, its likely that these results are directly applicable to applications like Asterisk (VoIP), where small UDP packets are heavily used. Hopefully we'll have a putback of the necessary tweaks before too long.

mpt SAS support on NetBSD

FYI, NetBSD has just got support for the LSI SAS controllers, such as that found on the Sun X4200. My patch to fix this was committed last night. (The work was a side project funded by TELES AG.)

Of course we'd much rather everyone ran Solaris on these machines, but if you need NetBSD for some reason, it works now.

Pullups to NetBSD 3 and 3.1 should be forthcoming.

Wednesday, July 25, 2007

hme GLDv3 versus qfe DLPI

So, the NICs group recently told me I should have started with qfe instead of hme, because qfe has some performance fixes. (Such as hardware checksum, which I added to hme!) To find out out if this holds water, I ran some tests, on my 360 MHz UltraSPARC-IIi system, using a PCI qfe card. (You can make hme bind to the qfe ports by doing

# rem_drv qfe
# update-drv -a -i '"SUNW,qfe"' hme

(This by the way is a nice hack to use GLDv3 features with your qfe cards today if you cannot wait for an official GLDv3 qfe port.)

Anyway, here's what I found out, using my hacked ttcp utility. Note that the times reported are "sys" times.

QFE/DLPI

MTU = 100, -n 2048
Tx: 18.3 Mbps, 7.0s (98%)
Rx: 5.7 Mbps, 2.4s (10%)

MTU = 1500, -n = 20480
Tx (v4): 92.1 Mbps, 1.1s (8%)
Rx (v4): 92.2 Mbps, 1.6s (12%)
Tx (v6): 91.2 Mbps, 1.1s (8%)
Rx (v6): 90.9 Mbps, 2.6s (22%

UDPv4 tx, 1500 (-n 20480) 90.5 Mbps, 1.6 (64%)
UDPv4 tx, 128 (-n 204800) 34.2 Mbps, 5.2 (99%)
UDPv4 tx, 64 (-n 204800) 17.4 Mbps, 5.1 (99%)

And here are the numbers for hme with GLDv3

HME GLDv3

MTU = 100, -n 2048
Tx: 16.0 Mbps, 7.6s (93%)
Rx: 11.6 Mbps, 1.8s (16%)

MTU = 1500, -n = 20480
Tx (v4): 92.1 Mbps, 1.2s (8%)
Rx (v4): 92.2 Mbps, 3.2s (24%)
Tx (v6): 90.8 Mbps, 0.8 (6%)
Rx (v6): 91.2 Mbps, 4.0s (29%)

UDPv4 tx, 1500 (-n 20480) 89.7 Mbps, 1.5s (60%)
UDPv4 tx, 128 (-n 204800) 29.4 Mbps, 6.0s (99%)
UDPv4 tx, 64 (-n 204800) 14.8 Mbps, 6.0s (99%)

So, given these numbers, it appears that either QFE is more efficient (which is possible, but I'm slightly skeptical) or the cost of the extra overhead of some of the GLDv3 support is hurting us. I'm more inclined to believe this. (For example, we have to check to see if the packet is a VLAN tagged packet... those features don't come for free... :-)

What is really interesting, is that the hme GLDv3 work was about 3% better than the old DLPI hme. So clearly there has been more effort invested into qfe.

Interestingly enough, the performance for Rx tiny packets with GLDv3 is better. I am starting to wonder if there is a difference in the bcopy/dvma thresholds.

So one of the questions that C-Team has to answer is, how important are these relatively minor differences in performance. On a faster machine, you'd be unlikely to notice at all. If this performance becomes a gating factor, I might find it difficult to putback the qfe GLDv3 conversion.

To be completely honest, tracking down the 1-2% difference in performance may not be worthwhile. I'd far rather work on fixing 1-2% gains in the stack than worry about how a certain legacy driver performs.

What are your thoughts? Let me know!

Tuesday, July 17, 2007

afe and mxfe status notes

For those of you that care, I think we're in the home stretch for integration of afe and mxfe into Nevada.

I spent the weekend going through the code, and reworking large portions of it to make use of zero-copy DMA wherever it was rational to do so (loan up for receive, direct binding for transmit).

The NICDRV test suite has also identified a number of issues with edge cases that didn't come up often, but which I'm glad to know about and have fixed in the version of the code getting putback.

They're only 100 Mbps nics, but the version of the code going into Nevada will make them run at pretty much the same speed as any other 100 Mbps NIC without IP checksum offload.

And, they are still 100% DDI compliant. :-) Thankfully the DDI has been extended for OpenSolaris since the last time I worried about such things (back in Solaris 8 days).

Anyway, looking forward to putback in b70 or b71. (Depending on whether I can get reviewers in time for b70 putback. If you can help me review, please let me know!)

Thursday, July 12, 2007

HME putback done

In case anyone ever wondered what a putback message looked like:

*********  This mail is automatically generated  *******

Your putback for the following fix(es) is complete:

PSARC 2007/319 HME GLDv3 conversion
4891284 RFE to add debug kstat counter for promiscuous mode to hme driver
6345963 panic in hme
6554790 Race betweeen hmedetach and hmestat_kstat_update
6568532 hme should support GLDv3
6578294 hme does not support hardware checksum


These fixes will be in release:

snv_70

The gate's automated scripts will mark these bugs "8-Fix Available"
momentarily, and the gatekeeper will mark them "10-Fix Delivered"
as soon as the gate has been delivered to the WOS. You should not
need to update the bug status.

Your Friendly Gatekeepers

Btw, the case to make this work for qfe (PSARC 2007/ 404) was approved yesterday as well. There are some internal resourcing questions yet to be answered, but at least architectecturally, the approach has been approved.

I would really, really love it some qfe owners would file a bug asking for qfe to be GLDv3. It would make it much easier for me, I think, if this case were seen as a response to customer demand. (So many people have requested qfe GLDv3 support... please file a bug! Even better, file an *escalation*!)

Note: none of this eligible for backport to S10. You have to use OpenSolaris if you want the good stuff. Gotta keep a few carrots in reserve, right? (Seriously, ndd and Sun Trunking incompatibilities make it unsuitable for backport to S10 anyway.)

Sunday, July 8, 2007

hme GLDv3 and *hardware checksum*

So I've been trying to run my GLDv3 port of hme through a very rigorous battery of tests called "nicdrv" (the test suite used for recent NIC drivers by Sun QE... hopefully soon to be open sourced, but that's another topic.)

Anyway, the test system I've been using is a poor little 360MHz US-II Tadpole system. (A Darwin-workalike, in shoe-box formfactor.)

Unfortunately, the test times out while trying to do the UDP RX tests. Which really shouldn't be surprising... the test was designed for gigabit network gear, with gigahertz system processors (or better.)

Well, it turns out that the hme driver can be faster. Considerably faster. Because the hardware supports IP checksum offload. But it was never enabled. (Note that this is true for the quad-port qfe boards as well, which are basically the same controller behind a bridge chip.)

So, I've decided to have another go at getting a fully successful test result with this hardware. By modifying the driver to support IP checksum offload. I'm hoping it may make the difference between a pass and fail. With tiny frames, every little bit helps.

Stay tuned here. Note that I also have logged performance data from earlier test runs, so I'll be able to compare that as well. One additional wrinkle in all this, is that I now feel compelled to test this with Sbus hme hardware. The oldest system I can find is a Sun Ultra 2. (Older Ultra 1 systems with 200 MHz and slower procs won't work. If anyone has an old Ultra 1 with 250 or better procs running Nevada, let me know!)