Yesterday I saw yet another argument about the Linux vs. Solaris scalability debate. The Linux fans were loudly proclaiming that the claim of Solaris' superior scalability is FUD in the presence of evidence like the Cray XT class of systems which utilize thousands of processors in a system, running Linux.
The problem with comparing (or even considering!) the systems in the Top500 supercomputers when talking about "scalability" is simply that those systems are irrelevant for the typical "scalability" debate -- at least as it pertains to operating system kernels.
Irrelevant?! Yes. Irrelevant. Let me explain.
First, one must consider the typical environment and problems that are dealt with in the HPC arena. In HPC (High Performance Computing), scientific problems are considered that are usually fully compute bound. That is to say, they spend a huge majority of their time in "user" and only a minuscule tiny amount of time in "sys". I'd expect to find very, very few calls to inter-thread synchronization (like mutex locking) in such applications.
Second, these systems are used by users who are willing, expect, and often need, to write custom software to deal with highly parallel architectures. The software deployed into these environments is tuned for use in situations where the synchronization cost between processors is expected to be "relatively" high. Granted the architectures still attempt to minimize such costs, using very highly optimized message passing busses and the like.
Third many of these systems (most? all?) are based on systems that don't actually run a single system image. There is not a single universal addressable memory space visible to all processors -- at least not without high NUMA costs requiring special programming to deliver good performance, and frequently not at all. In many ways, these systems can be considered "clusters" of compute nodes around a highly optimized network. Certainly, programming systems like the XT5 is likely to be similar in many respects to programming software for clusters using more traditional network interconnects. An extreme example of this kind of software is SETI@home, where the interconnect (the global Internet) can be extremely slow compared to compute power.
So why does any of this matter?
It matters because most traditional software is designed without NUMA-specific optimizations, or even cluster-specific optimizations. More traditional software used in commercial applications like databases, web servers, business logic systems, or even servers for MMORPGs spend a much larger percentage of their time in the kernel, either performing some fashion of I/O or inter-thread communication (including synchronization like mutex locks and such.)
Consider a massive non-clustered database. (Note that these days many databases are designed for clustered operation.) In this situation, there will be some kind of central coordinator for locking and table access, and such, plus a vast number of I/O operations to storage, and a vast number of hits against common memory. These kinds of systems spend a lot more time doing work in the operating system kernel. This situation is going to exercise the kernel a lot more fully, and give a much truer picture of "kernel scalability" -- at least as the arguments are made by the folks arguing for or against Solaris or Linux superiority.
Solaris aficionados claim it is more scalable in handling workloads of this nature -- that a single SMP system image supporting traditional programming approaches (e.g. a single monolithic process made up of many threads for example) will experience better scalability on a Solaris system than on a Linux system.
I've not measured it, so I can't say for sure. But having been in both kernels (and many others), I can say that the visual evidence from reading the code is that Solaris seems like it ought to scale better in this respect than any of the other commonly available free operating systems. If you don't believe me, measure it -- and post your results online. It would be wonderful to have some quantitative data here.
Linux supporters, please, please stop pointing at the Top500 as evidence for Linux claims of superior scalability though. If there are some more traditional commercial kinds of single-system deployments that can support your claim, then lets hear about them!