Monday, November 10, 2008

locking hints for device drivers

It seems that I often run into the same problems over and over again, and I see many device drivers which often suffer from the same problem. Here are some strategies I use in my own coding -- maybe someone else will find them useful. To my knowledge they have not been documented elsewhere.

  1. KISS. Start with as few locks as you need to get the job done. Only add locks when performance analysis shows you need them. I usually start with an assumption that I'll need one lock per device instance, and possibly a single global lock. (If you don't have global state, you can elide the global lock.) For some kinds of drivers (NICs in particular) I introduce a lock for each major traffic direction (so one lock for rx, and one for tx) per instance.
  2. Global state is evil. Global state requires global locks. Global locks often introduce lock ordering problems, and can also more likely to be contended in systems with lots of devices.
  3. Thou shalt not sleep in STREAMs entry put(9E) and srv(9E) entry points. This one I frequently see violated. These can run in interrupt context. Don't call cv_wait(9F) or its friends here.
  4. Use mutex(9F)es as your lock primitive of choice. rwlock(9F)s are slightly more expensive (and can be more challenging to get read versus write sorted out properly), and semaphore(9F)s can fall prey to priority inversion.
  5. Hold those mutexes for as short as possible. If you can do some work ahead of time, or defer it to later, outside of the time you hold the mutex, you'll greatly reduce opportunities for contention. Uncontended mutexes are good. Contended mutexes are bad. Sometimes small bits of code reordering can have significant performance impact.
  6. lockstat(1M) is your friend. It can really help you to understand what the hot locks in your driver are.
  7. ASSERT(9F) is your friend. You can place assertions up front of functions that have to be called with certain locks held. Its easier to debug assertion failures than it is to debug deadlock or (far far worse) race conditions.
  8. Be uncompromising about order of lock acquisition. Ensure you always, always acquire mutexes in the same order.
  9. Avoid locking side effects. Functions should usually exit with the same locks held that they started with.
  10. Leaf locks are easier to understand. If you can reduce a lock to a leaf lock (one that is never held across functions which do any other locking), then you're guaranteed that the lock is safe.
  11. Avoid assumptions about locks held in the framework. With very few exceptions (bcopy(9F), strlen(9F), etc.) you can generally assume almost any DDI routine you call will need to acquire a lock. Hopefully most of them are leaf locks.
  12. Be very concerned if you think you need to drop a lock only to reacquire it. (Such as dropping a lock to make a call up into the DDI.) This is one of the more frequently encountered problem areas. Locks used to create atomicity are only effective if the atomicity is not broken. (If you drop a lock, you must not assume any of the conditions that held true when it was first acquired remain true when it is reacquired.)
  13. Minimize asynchronous activity. Do you really need to fire off a taskq, or use timeout(9F) to perform that operation? More asynchronous threads is more complexity, and violates principle 1 (KISS).
Of course, there are often good reasons to violate one or more of the principles above. I find that if I try to to adhere to the above principles though, I run into fewer ugly problems in debugging kernel software. So I always do a double take if I think I need to violate one of the principles -- probably about 70-80 percent of the time, that double take leads to a simpler or better approach that doesn't violate the principles, and ultimately saves me lots of pain later.

Tuesday, November 4, 2008

BMC on concurrency

Bryan Cantrill has just posted one of the more excellent blog posts I've seen this year -- he manages to put to words some of my heretofore unvoiced doubts about transactional memory. I'm a big fan of KISS, and both TM and multilevel threading seem to fail on that front.

Saturday, November 1, 2008

fdc suspend/resume support

In another update, I've updated fdc (the floppy disk controller) so that it supports suspend/resume even if a floppy drive is present. There is one caveat though -- namely that the drive must be empty. Still, its far easier to pull a disk out of a drive (if you even have one -- most floppy drives these days probably see very little, if any, use), than to disconnect the drive entirely.

You can download the file from the device drivers download page (look for a file named fdc-2008-11-01.tar.gz). A webrev of the changes is also posted.

For the curious, the reason for the above caveat is best stated by the following explanation extracted from a comment in the code:
Bad, bad, bad things will happen if someone changes the disk in the drive while it is mounted and the system is suspended. We have no way to detect that. (Undetected filesystem corruption. Its akin to changing the boot disk while the system is suspended. Don't do it!)

So we refuse to suspend if there is media present in the drive. This limits some of the usability of suspend/resume, but it certainly avoids this potential filesytem corruption from pilot error. Given the decreasing popularity of floppy media, we don't see this as much of a limitation.
Anyway, I suspect these changes may have little value to laptops, but may help a lot of desktops. It certainly addresses the problem of SUSPEND for my Dell Precision M390.

Update: Thanks to J├╝rgen Keil's testing, we have found a bug, which I've fixed. The webrev and the binaries have been updated accordingly. Look for the 11-04 release instead.

iprb updated with suspend/resume & quiesce

I've updated iprb (Intel Pro/100 Ethernet driver) to support suspend/resume and quiesce (fast reboot). I've also made some other changes, such as updating to the latest microcode from Intel, fixing some potential races, and improving the internal implementation of the statistics routine.

I'd like folks to try it out on their Intel or AMD platforms. The tarball can be downloaded from here (look for iprb-2008-11-01.tar.gz).

I still want to get this open sourced, but I haven't got approval from the lawyers yet. But here are the binary bits in case anyone is waiting for them.

(I'm not planning on porting this driver to SPARC. There are too many places in the code that would have to be changed to use endian-safe calls to ddi_get, and I think there is probably very little demand for a SPARC version of the driver. If you are one of the people who do want SPARC support for iprb, please let me know.