Critical sections with multicore processors

Tags:

With a single-core processor, where all your threads are run from the one single CPU, the idea of implementing a critical section using an atomic test-and-set operation on some mutex (or semaphore or etc) in memory seems straightforward enough; because your processor is executing a test-and-set from one spot in your program, it necessarily can't be doing one from another spot in your program disguised as some other thread.

But what happens when you do actually have more than one physical processor? It seems that simple instruction level atomicity wouldn't be sufficient, b/c with two processors potentially executing their test-and-set operations at the same time, what you really need to maintain atomicity on is access to the shared memory location of the mutex. (And if the shared memory location is loaded into cache, there's the whole cache consistency thing to deal with, too..)

This seems like it would incur far more overhead than the single core case, so here's the meat of the question: How much worse is it? Is it worse? Do we just live with it? Or sidestep it by enforcing a policy that all threads within a process group have to live on the same physical core?

393

asked Jun 11 '09 11:06

JustJeff

4 Answers

Multi-core/SMP systems are not just several CPUs glued together. There's explicit support for doing things in parallel. All the synchronization primitives are implemented with the help of hardware along the lines of atomic CAS. The instruction either locks the bus shared by CPUs and memory controller (and devices that do DMA) and updates the memory, or just updates the memory relying on cache snooping. This in turn causes cache coherency algorithm to kick in forcing all involved parties to flush their caches.

Disclaimer - this is very basic description, there are more interesting things here like virtual vs. physical caches, cache write-back policies, memory models, fences, etc. etc.

If you want to know more about how OS might use these hardware facilities - here's an excellent book on the subject.

181

answered Sep 21 '22 13:09

Nikolai Fetissov

The vendor of multi-core cpus has to take care that the different cores coordinate themselves when executing instructions which guarantee atomic memory access.

On intel chips for instance you have the 'cmpxchg' instruction. It compares the value stored at a memory location to an expected value and exchanges it for the new value if the two match. If you precede it with the 'lock' instruction, it is guaranteed to be atomic with respect to all cores.

answered Sep 21 '22 13:09

Tobias

You would need a test-and-set that forces the processor to notify all the other cores of the operation so that they are aware. Yes, that introduces an overhead and you have to live with it. It's a reason to design multithreaded applications in such a way that they don't wait for synchronization primitives too often.

answered Sep 22 '22 13:09

sharptooth

Or sidestep it by enforcing a policy that all threads within a process group have to live on the same physical core?

That would cancel the whole point of multithreading. When you are using a lock, semaphore, or other syncronization techniques, you are relying on OS to make sure that these operations are interlocked, no matter how many cores you are using.

The time to switch to a different thread after a lock has been released is mostly determined by the cost of a context switch. This SO thread deals with the context switching overhead, so you might want to check that.

There are some other interesting threads also:

What are the differences between various threading synchronization options in C#?
Threading best practices

You should read this MSDN article also: Understanding the Impact of Low-Lock Techniques in Multithreaded Apps.

answered Sep 19 '22 13:09

Groo

Related questions
                            
                                Python logger prints the same output several times in multithreaded environment [duplicate]
                            
                                Multi - threading
                            
                                Setting up idle thread/signalling thread
                            
                                ZeroMQ multithreading: create sockets on-demand or use sockets object pool?
                            
                                C# Is locking within getters and setters necessary?
                            
                                Waiting for multiple callbacks in Android
                            
                                Rails stop logging when allow_concurrency = true
                            
                                Trying to recreate java.util.ConcurrentModificationException
                            
                                Can this unexpected behavior of PrepareConstrainedRegions and Thread.Abort be explained?
                            
                                When should a ManualResetEvent be disposed?
                            
                                C++11 threads in class
                            
                                Python threads are not being garbage collected
                            
                                Memory consumption when using Delphi7 COM interfaces in a multithreaded way
                            
                                Atomic Compare Operator (No swap)
                            
                                Copy constructor for classes with atomic member
                            
                                multithreading: Why aren't generators thread-safe? What happens when it is shared among threads?
                            
                                Android send data from main UI thread to another thread
                            
                                When should you not use [[carries_dependency]]?
                            
                                Return a dispatch_async fetched variable [duplicate]
                            
                                boost::threadpool::pool vs.boost::thread_group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Critical sections with multicore processors

Tags:

cpu-architecture

synchronization

multithreading