Wednesday, August 1, 2007

partial checksum bug

As a result of investigation of a fix for 6587116 (a bug in HME, more later), we have found a gaping bug in the implementation of UDP checksums on Solaris.

Most particularly, it appears that UDP hardware checksum offload is broken for the cases where the checksum calculation will result in a 16-bit value of 0. Most protocols (TCP, ICMP, etc.) specify that the value 0 be used for the checksum in this case.

UDP, however, specifies that the value 0xffff be substituted for 0. Why ? Because 0 is given special meaning. In IPv4 networks, it means that transmitter did not bother to include a checksum. In IPv6, the checksum is mandatory, and RFC 2460 says that when the receiver sees a packet with a zero checksum it should be discarded.

The problem is, the hardware commonly in use on Sun SPARC systems (hme, eri, ge, and probably also ce and nxge) does not have support for this particular semantic. Furthermore, we have no way to know, in the current spec, if this semantic should be applied (short of directly parsing the packet, which presents its own challenges and hits to performance).

We'll have to figure out how to deal with this particular problem, sometime soonish. My guess is that all Sun NICs will lose IP checksum acceleration (transmit side only) for UDP datagrams, and that those 3rd party products which can do something different will need another flag bit indicating UDP semantics.

No comments: