Kernel Scheduling for 1024 CPUs

Tags:

Azul Systems has an appliance that supports thousands of cache coherent CPUs. I would love insight into what changes would need to occur to an operating system in order to schedule thousands of simultaneously running threads.

223

asked Apr 10 '09 20:04

McGovernTheory

3 Answers

Scheduling thousands of threads is not a big deal, but scheduling them on hundreds of CPUs is. What you need, first and foremost, is very fine-grained locking, or, better yet, lock-free data structures and algorithms. You just can't afford to let 200 CPUs waiting while one CPU executes a critical section.

182

answered Oct 23 '22 11:10

Mark Probst

You're asking for possible changes to the OS, so I presume there's a significant engineering team behind this effort.

There are also a few pieces of clarififying info that would help define the problem parameters:

How much IPC (inter process communication) do you need?
Do they really have to be threads, or can they be processes?
If they're processes, is it okay if the have to talk to each other through sockets, and not by using shared memory?
What is the memory architecture? Are you straight SMP with 1024 cores, or is there some other NUMA (Non-Uniform Memory Architecture) or MMP going on here? What are your page tables like?

Knowing only the very smallest of info about Azul systems, I would guess that you have very little IPC, and that a simple "run one kernel per core" model might actually work out just fine. If processes need to talk to each other, then they can create sockets and transfer data that way. Does your hardware support this model? (You would likely end up needing one IP address per core as well, and at 1024 IP addrs, this might be troublesome, although they could all be NAT'd, and maybe it's not such a big deal). If course, this model would lead to some inefficiencies, like extra page tables, and a fair bit of RAM overhead, and may even not be supported by your hardware system.

Even if "1 kernel per core" doesn't work, you could likely run 1024/8 kernels, and be just fine, letting each kernel control 8 physical CPUs.

That said, if you wanted to run 1 thread per core in a traditional SMP machine with 1024 cores (and only a few physical CPUs) then I would expect that the old fashioned O(1) scheduler is what you'd want. It's likely that your CPU[0] will end up nearly 100% in kernel and doing interrupt handling, but that's just fine for this use case, unless you need more than 1 core to handle your workload.

answered Oct 23 '22 11:10

slacy

Making Linux scale has been a long and ongoing project. The first multiprocessor capable Linux kernel had a single lock protecting the entire kernel (the Big Kernel Lock, BKL), which was simple, but limited scalability.

Subsequently the locking has been made more fine-grained, i.e. there are many locks (thousands?), each covering only a small portion of data. However, there are limits to how far this can be taken, as fine-grained locking tends to be complicated, and the locking overhead starts to eat up the performance benefit, especially considering that most multi-CPU Linux systems have relatively few CPU's.

Another thing, is that as far as possible the kernel uses per-cpu data structures. This is very important, as it avoids the cache coherency performance issues with shared data, and of course there is no locking overhead. E.g. every CPU runs its own process scheduler, requiring only occasional global synchronization.

Also, some algorithms are chosen with scalability in mind. E.g. some read-mostly data is protected by Read-Copy-Update (RCU) instead of traditional mutexes; this allows readers to proceed during a concurrent update.

As for memory, Linux tries hard to allocate memory from the same NUMA node as where the process is running. This provides better memory bandwidth and latency for the applications.

answered Oct 23 '22 10:10

janneb

Related questions
                            
                                Maximum thread limit?
                            
                                Synchronized Methods
                            
                                Intel TBB will work on AMD processors? [duplicate]
                            
                                Multiple threads writing to the same CSV in Python
                            
                                Thread.getId() global uniqueness question
                            
                                Using Interlocked.CompareExchange with a class
                            
                                How to implement thread which periodically checks something using minimal resources?
                            
                                C#: String as parameter to event?
                            
                                .NET, the SqlConnection object, and multi-threading
                            
                                Python threading interrupt sleep
                            
                                threading appears to run threads sequentially
                            
                                Abort call to unmanaged DLL
                            
                                Calling Javascript function from a C++ callback in V8
                            
                                How to use PTRACE to get a consistent view of multiple threads?
                            
                                Is it reasonable to call CloseHandle() on a thread before it terminates?
                            
                                Renaming Threads in Java
                            
                                Windows Threading Wait Method
                            
                                How to manage python threads results?
                            
                                Is StringBuilder threadsafe (using it with parallelStream)?
                            
                                Is there any difference between "mutex" and "atomic operation"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kernel Scheduling for 1024 CPUs

Tags:

multithreading

linux-kernel

kernel

bsd