In x86 assembly there is a LOCK prefix that can be applied to an instruction to make it atomic. Is this atomicity across all cores? What is the usual delay involved? For a regular mutex, what instruction is it that's locked?
Thanks. PS: I was taught that on systems that lack such an instruction that mutexes can still be done but it is more laborious. I wonder if anyone does it that way any more.
A mutex is the starting point for a critical section, which uses a mutex internally to see if it can enter a section of code. If the mutex is free, it sets the mutex and executes the code, only to release the mutex when done.
A mutex is initialized in the beginning of the main function. The same mutex is locked in the 'trythis()' function while using the shared resource 'counter'. At the end of the function 'trythis()' the same mutex is unlocked. At the end of the main function when both the threads are done, the mutex is destroyed.
The LOCK prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible.
The idea behind mutexes is to only allow one thread access to a section of memory at any one time. If one thread locks the mutex, any other lock attempts will block until the first one unlocks. However, how is this implemented? To lock itself, the mutex has to set a bit somewhere that says that it is locked.
On x86 lock prefix locks all cores and allows atomicity. For implementing this on other systems without LOCK
, CMPXCHG
loops are used or tight loops with retry logic that attempt to set the value of something to the expected value. As you can see the 2nd way is more detrimental in most cases as it's just constantly looping trying to set the value (and keeps doing so until it's done). For x86 the delay is minimal and might range from halting the instruction pipeline or flushing it and then executing that instruction atomically (typically a couple of cycles), the 2nd method can't really be estimated as it depends how much contention there is for the value that needs to be modified atomically. For mutex's I believe that it's a flag value that must be acquired (check that the mutex isn't acquired and continuously wait until the mutex is up for grabs then attempt to atomically change a flag to acquire it).
AFAIK IBM processors use the 2nd method as when working on the linux kernel I found that the atomic increment function uses it (maybe it's only for older processors). The x86 platform still uses
lock addl ...;
I will admit though that it's been about a year since I worked in this part of the kernel so I could be wrong on some points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With