Tuesday, July 8, 2008

Cut & Paste Device Driver Design

A common approach to device driver design is to start with a device driver for a similar piece of hardware that someone else wrote, and copy it, then modify it. However, this approach is fraught with peril, and I thought it might be a good idea to record my thoughts on this for posterity. If this post saves just one future developer from falling into this pit, it will have been worthwhile.

The first problem is that most device drivers have a number of non-trivial "features" in them that relate to the specific issues of a particular piece of hardware. Often this means that there are internal design features of the driver that are not ideal for a new driver. (A classic example would be nxge's use of particular hacks relating to the location of it in relation to the IOMMU on certain Niagra processors. No other driver is ever going to need that.)

The second problem is that most device drivers are not 100% bug free. The copy-modify approach to coding tends to lead to bugs being copied from one driver to another.

The third, and possibly most significant problem, is that when the copy-modify approach is used, the author doing the copying rarely has a complete understanding of all of the original source driver's features, and the end result is a new driver that the original author isn't aware of (and isn't maintaining), and with code that the new author might not fully understand.

A fourth problem is that real drivers are complicated. I frequently hear advice given to new NIC driver authors to "just copy bge". That is terrible advice to give to someone who may be trying to get their head around a new DDK. As a real driver, bge carries a lot of legacy baggage with it, that new drivers definitely shouldn't repeat. All that baggage obfuscates the original code, and the would-be-copier may have little ability to contain the entire device driver in his head.

This leads to the fifth problem, which is that copying repeats all the optimization steps, but elides the measurement and experience that led to them. This violates one of my core design principles, which is KISS (keep it simple, stupid) -- design for simplicity first, and optimize only when proven necessary. (Example: does a 100MB NIC really need to have the logic associated with "buffer loan up" in it? Almost certainly not! And in fact, despite the fact that many NIC drivers have such logic in them, dating back to older architectures, the loan up actually has significant problems, and results in lower performance than the far simpler unconditional bcopy.

A sixth problem is that this approach can often lead to reproduction of design mistakes or limitations. For example, if one were to build a new driver for SPARC hardware by copying hme, the end driver would likely not be 100% portable to x86. (And even if it were made portable, the first pass would probably be quite suboptimal from a performance standpoint owing to hme's use of SPARC-only dvma interfaces.)

The upshot of all this is that I would strongly encourage prospective device driver authors not to start a new driver by copying source code from another driver.

A far better approach, IMO, is to start with a "skeleton driver" (Writing Device Drivers has some such skeleton drivers for different kinds of drivers) and flesh code as you go. Ask questions, on the driver-discuss@opensolaris.org mailing list as well, if you're not familiar with a particular type of driver.

For folks who might argue that starting fresh from a skeleton violates code-reuse and increases development costs, I would say that

  1. The end driver will almost certainly be more maintainable, as the author will understand every line of code.
  2. The cost of developing a new driver from scratch is probably not that high compared to the copy-modify approach (unless the source and target drivers are for devices that are very very similar. )
  3. The educational benefit of taking this approach is very great as well. I can think of no better way to truly understand the DDI/DDK than to write a device driver from scratch.
  4. For commonly reused code, a library with well defined interfaces is a far far better approach. The driver frameworks in Solaris such as SCSA and GLD are good examples of this. (Third party approaches such as Murayama's GEM also qualify in this regard.) This way there is only one real copy of the code, and clear interface boundaries make it possible for multiple parties to reuse the code safely, without depending on "implementation details".
There are some folks who "start from scratch" - I hope I'll see more of them in the future.

1 comment:

Cyril Plisko said...

Garrett,

nice write up. I completely agree with you . I am too, always (well almost) writing my drivers from scratch, despite temptation to just copy one of my previous one. I grew my own template/skeleton library and when just use one of the closest when I am starting from scratch. Needless to say, that templates have no hardware-related parts, only standard autoload/configuration and empty entry points.