I figured I better post this publicly. If you have a Ricoh SD controller (pci1180,522) then you don't want to use SDcard in OpenSolaris with it. You'll probably get panics, memory corruption, etc.
There is a known bug -- CR 6797937 that tracks this issue. I've filed an RTI for this issue, and hope to have it pushed later today. You can review the diffs for this on this webrev, if you're so inclined.
This is known to affect Toshiba Tecra M10, IBM T61, and probably a bunch of others besides.
Sorry to those folks affected, we're trying to drive the fix for this out as quickly as possible. It should be in 109.
Update (Feb 5, 8:19pm PST): The fix for this problem was just pushed into ON. It will be in tonight's nightly build.
Thursday, February 5, 2009
Friday, January 23, 2009
Boomer Updates
Just a quick note: no Boomer beta release this week. Due to a bug which we fixed in ON, but which we are dependent upon, there won't be any formal beta release until after SXCE b107 ships. And it will depend on SXCE b107. (And likewise, when the OpenSolaris package repos are updated to use b107, you'll be able to use OpenSolaris and Boomer together.)
I probably will post an updated code review, and possibly BFU archives, for the folks that want to play with these bits earlier and are willing to deal with BFU. (Hmmm.. does BFU work correctly on OpenSolaris? I've only used it on SXCE.) That posting will probably occur on Monday or Tuesday of next week. Stay tuned.
(Is there any interest in an external Mercurial repo for this stuff? I hadn't been planning on one, but if folks want one and will use it, I'll look into it.)
I probably will post an updated code review, and possibly BFU archives, for the folks that want to play with these bits earlier and are willing to deal with BFU. (Hmmm.. does BFU work correctly on OpenSolaris? I've only used it on SXCE.) That posting will probably occur on Monday or Tuesday of next week. Stay tuned.
(Is there any interest in an external Mercurial repo for this stuff? I hadn't been planning on one, but if folks want one and will use it, I'll look into it.)
Saturday, January 17, 2009
Boomer Status Update
I've received a number of e-mails inquiring about Boomer, the next generation audio system for Solaris. I thought I'd take moment to snapshot where we're at.
The good news:
The not-so-good news:
The good news:
- We're very nearly done with Phase I. There are still a couple of bugs to fix but its looking very promising that we'll have a strong public Beta release later this month, with integration into ON in February or March.
- The release will include multichannel surround across a fairly wide range of devices.
- All the existing Sun audio drivers are supported (except Sun Ray)
- All the drivers except usb are "native" Boomer drivers, with greatly reduced complexity and (hopefully) much better reliability
- We have much better support for adjusting different device settings, either from the CLI or from a GUI. This includes surround settings, special device features (such as 3D enhancements), and even jack retasking on certain codecs. (You can even use this to have 5.1 audio on an older Sun Ultra 20!)
- We also have full support for Solaris features like suspend/resume and quiesce.
The not-so-good news:
- There are some features that will be MIA. SPDIF (digital out) support is very limited, only working with audiohd at present.
- We don't have as many new drivers (yet) as we had hoped.
- Dolby Digital (AC3) support has been pushed out to Phase 2.
- Support for Sun Ray is Phase 2 as well.
- Support for "virtual audio" (where the /dev/audio and /dev/dsp are virtual devices that seamlessly choose the "correct" audio device, and redirect audio in response to hotplug events -- even for running applications) is looking questionable for Phase I, it may be a Phase 2 item as well.
- At the moment, all of our drivers run at 48kHz and use 16-bit audio. We can update (and probably will update) some of them to support codecs that have additional bit width or higher frequency options, but this can be done on an as needed basis. (So far the vast majority of devices out there support only 48 kHz/16-bit audio, anyway.)
Thursday, December 25, 2008
Boomer Beta Source Code Posted
Boomer, aka PSARC 2008/318 -- the new audio subsystem for Solaris, and the project I've been working on for the last several months -- has released source code for our work in progress. A status update and link to the webrev is available here.
Wednesday, December 3, 2008
Floppy drive suspend/resume fixed
Earlier today I just pushed a fix for suspend/resume when the machine has a floppy disk drive. Surprisingly, a lot of modern machines still have floppy disk drives in them, and this fix enables those platforms to participate in the S3 suspend work.
The fix will be in b105.
One such machine is the Dell Precision M390. I imagine there are many other desktop class systems of this nature.
While suspend/resume has been focused largely on laptops (where battery life is a paramount concern), I suspect that enabling suspend-to-RAM for desktop class systems will ultimately have a larger environmental impact, since it is those systems that typically consume the most power. So, perhaps this small change will have a somewhat significant impact.
If you have a platform that this enabled suspend/resume to work on, tell me about it. Include smbios output in your message, too. :-)
The fix will be in b105.
One such machine is the Dell Precision M390. I imagine there are many other desktop class systems of this nature.
While suspend/resume has been focused largely on laptops (where battery life is a paramount concern), I suspect that enabling suspend-to-RAM for desktop class systems will ultimately have a larger environmental impact, since it is those systems that typically consume the most power. So, perhaps this small change will have a somewhat significant impact.
If you have a platform that this enabled suspend/resume to work on, tell me about it. Include smbios output in your message, too. :-)
Tuesday, December 2, 2008
SDcard Fixes
I've heard a number of problems with folks trying to use SDcard with Ricoh controllers (such as found on certain newer Toshiba Tecra laptops), as well as lack of quiesce(9e) support, and various other problems.
To address these problems, I've made a bunch of changes, and am looking for code reviewers and testers -- the webrev is an open review.
I've made binaries available to anyone who wants to try these out. To try out the new binaries, just untar the sdcard-20081202.tar.gz file in your / directory. You might need to reboot for them to take effect. (If you know how to modunload the drivers from kernel memory, that will work too, provided that you don't have any SDcard media inserted at the time.)
Please let me know any feedback you might have. Thanks.
To address these problems, I've made a bunch of changes, and am looking for code reviewers and testers -- the webrev is an open review.
I've made binaries available to anyone who wants to try these out. To try out the new binaries, just untar the sdcard-20081202.tar.gz file in your / directory. You might need to reboot for them to take effect. (If you know how to modunload the drivers from kernel memory, that will work too, provided that you don't have any SDcard media inserted at the time.)
Please let me know any feedback you might have. Thanks.
Monday, November 10, 2008
locking hints for device drivers
It seems that I often run into the same problems over and over again, and I see many device drivers which often suffer from the same problem. Here are some strategies I use in my own coding -- maybe someone else will find them useful. To my knowledge they have not been documented elsewhere.
- KISS. Start with as few locks as you need to get the job done. Only add locks when performance analysis shows you need them. I usually start with an assumption that I'll need one lock per device instance, and possibly a single global lock. (If you don't have global state, you can elide the global lock.) For some kinds of drivers (NICs in particular) I introduce a lock for each major traffic direction (so one lock for rx, and one for tx) per instance.
- Global state is evil. Global state requires global locks. Global locks often introduce lock ordering problems, and can also more likely to be contended in systems with lots of devices.
- Thou shalt not sleep in STREAMs entry put(9E) and srv(9E) entry points. This one I frequently see violated. These can run in interrupt context. Don't call cv_wait(9F) or its friends here.
- Use mutex(9F)es as your lock primitive of choice. rwlock(9F)s are slightly more expensive (and can be more challenging to get read versus write sorted out properly), and semaphore(9F)s can fall prey to priority inversion.
- Hold those mutexes for as short as possible. If you can do some work ahead of time, or defer it to later, outside of the time you hold the mutex, you'll greatly reduce opportunities for contention. Uncontended mutexes are good. Contended mutexes are bad. Sometimes small bits of code reordering can have significant performance impact.
- lockstat(1M) is your friend. It can really help you to understand what the hot locks in your driver are.
- ASSERT(9F) is your friend. You can place assertions up front of functions that have to be called with certain locks held. Its easier to debug assertion failures than it is to debug deadlock or (far far worse) race conditions.
- Be uncompromising about order of lock acquisition. Ensure you always, always acquire mutexes in the same order.
- Avoid locking side effects. Functions should usually exit with the same locks held that they started with.
- Leaf locks are easier to understand. If you can reduce a lock to a leaf lock (one that is never held across functions which do any other locking), then you're guaranteed that the lock is safe.
- Avoid assumptions about locks held in the framework. With very few exceptions (bcopy(9F), strlen(9F), etc.) you can generally assume almost any DDI routine you call will need to acquire a lock. Hopefully most of them are leaf locks.
- Be very concerned if you think you need to drop a lock only to reacquire it. (Such as dropping a lock to make a call up into the DDI.) This is one of the more frequently encountered problem areas. Locks used to create atomicity are only effective if the atomicity is not broken. (If you drop a lock, you must not assume any of the conditions that held true when it was first acquired remain true when it is reacquired.)
- Minimize asynchronous activity. Do you really need to fire off a taskq, or use timeout(9F) to perform that operation? More asynchronous threads is more complexity, and violates principle 1 (KISS).
Subscribe to:
Posts (Atom)