Monday, November 10, 2008

locking hints for device drivers

It seems that I often run into the same problems over and over again, and I see many device drivers which often suffer from the same problem. Here are some strategies I use in my own coding -- maybe someone else will find them useful. To my knowledge they have not been documented elsewhere.

  1. KISS. Start with as few locks as you need to get the job done. Only add locks when performance analysis shows you need them. I usually start with an assumption that I'll need one lock per device instance, and possibly a single global lock. (If you don't have global state, you can elide the global lock.) For some kinds of drivers (NICs in particular) I introduce a lock for each major traffic direction (so one lock for rx, and one for tx) per instance.
  2. Global state is evil. Global state requires global locks. Global locks often introduce lock ordering problems, and can also more likely to be contended in systems with lots of devices.
  3. Thou shalt not sleep in STREAMs entry put(9E) and srv(9E) entry points. This one I frequently see violated. These can run in interrupt context. Don't call cv_wait(9F) or its friends here.
  4. Use mutex(9F)es as your lock primitive of choice. rwlock(9F)s are slightly more expensive (and can be more challenging to get read versus write sorted out properly), and semaphore(9F)s can fall prey to priority inversion.
  5. Hold those mutexes for as short as possible. If you can do some work ahead of time, or defer it to later, outside of the time you hold the mutex, you'll greatly reduce opportunities for contention. Uncontended mutexes are good. Contended mutexes are bad. Sometimes small bits of code reordering can have significant performance impact.
  6. lockstat(1M) is your friend. It can really help you to understand what the hot locks in your driver are.
  7. ASSERT(9F) is your friend. You can place assertions up front of functions that have to be called with certain locks held. Its easier to debug assertion failures than it is to debug deadlock or (far far worse) race conditions.
  8. Be uncompromising about order of lock acquisition. Ensure you always, always acquire mutexes in the same order.
  9. Avoid locking side effects. Functions should usually exit with the same locks held that they started with.
  10. Leaf locks are easier to understand. If you can reduce a lock to a leaf lock (one that is never held across functions which do any other locking), then you're guaranteed that the lock is safe.
  11. Avoid assumptions about locks held in the framework. With very few exceptions (bcopy(9F), strlen(9F), etc.) you can generally assume almost any DDI routine you call will need to acquire a lock. Hopefully most of them are leaf locks.
  12. Be very concerned if you think you need to drop a lock only to reacquire it. (Such as dropping a lock to make a call up into the DDI.) This is one of the more frequently encountered problem areas. Locks used to create atomicity are only effective if the atomicity is not broken. (If you drop a lock, you must not assume any of the conditions that held true when it was first acquired remain true when it is reacquired.)
  13. Minimize asynchronous activity. Do you really need to fire off a taskq, or use timeout(9F) to perform that operation? More asynchronous threads is more complexity, and violates principle 1 (KISS).
Of course, there are often good reasons to violate one or more of the principles above. I find that if I try to to adhere to the above principles though, I run into fewer ugly problems in debugging kernel software. So I always do a double take if I think I need to violate one of the principles -- probably about 70-80 percent of the time, that double take leads to a simpler or better approach that doesn't violate the principles, and ultimately saves me lots of pain later.

No comments: