I use C++ since a long time, and now I'm starting to learn assembly and learn how processors work (not just for fun, but I have to as a part of a test program). While learning assembly, I started hearing some of the terms that I hear here and there when discussing multithreading, given that I do lots of multithreading in scientific computing. I'm struggling to get the full picture, and I'd appreciate helping me to widen my picture.
I learned that a bus, in its simplest form, is something like a multiplexer followed by a demultiplexer. Each of the ends takes an address as input, in order to connect the two ends with some external component. The two ends can, based on the address, point to memory, graphics card, RAM, CPU registers, or anything else.
Now getting to my question: I keep hearing people arguing on whether to use a mutex or an atomic for thread safety (I know there's no ultimate answer, this is not what my question is, but my question is about the comparison). Here for example, the claim was made that atomics are so bad that they will prevent a processor from doing a decent job because of bus-locking.
Could someone please explain what bus-locking is, in a little detail, and why it is not like mutexes, while AFAIK, mutexes need at least two atomic operations to lock and unlock.
Intel's choice was to lock the whole memory bus to solve the coherency problem; the processor locks the bus for the duration of the operation, meaning that no other CPUs or devices can access it. The split lock blocks not only the CPU performing the access, but also all others in the system.
atomic<T> variables don't use locks (at least where T is natively atomic on your platform), but they're not lock-free in the sense above. You might use them in the implementation of a lock-free container, but they're not sufficient on their own.
User level locks involve utilizing the atomic instructions of processor to atomically update a memory space. The atomic instructions involve utilizing a lock prefix on the instruction and having the destination operand assigned to a memory address.
An atomic access is a term for a series of accesses to a memory region. Atomic accesses are used by managers when they would like to perform a sequence of accesses to a particular memory region, while being sure that the original data in the region are not corrupted by writes from other managers.
From Intel® 64 and IA-32 Architectures Software Developer’s Manual:
Beginning with the P6 family processors, when the
LOCK
prefix is prefixed to an instruction and the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted. Instead, only the processor’s cache is locked. Here, the processor’s cache coherency mechanism ensures that the operation is carried out atomically with regards to memory.
There are special non-temporal store instructions to bypass the cache. All other loads and stores normally go through the cache, unless the memory page is marked as non-cacheable (like GPU or PCIe device memory).
"I learned that a bus, in its simplest form, is something like a multiplexer followed by a demultiplexer. Each of the ends"
Well, that's not correct. In its simplest form there's nothing to multiplex or demultiplex. It's just two things talking directly to each other. And in the nost-so simple case, a bus may have three or more devices connected. In that case, you start needing bus addresses because you no longer can talk about "the other end".
Now if you've got multiple devices on a single bus, they generally can't all talk at the same time. There must be some mechanism to prevent them from all talking at the same time. Yet for all devices to be able to share that bus, they must be able to alternate who is talking to who. Bus locking as a broad term means any deviation from the usual pattern, where two devices reserve the bus for their mutual conversation.
In the particular context of the x86 memory bus, this means keeping the bus locked during a read-modify-write cycle (as Kerrek SB pointed out in comments). Now this may sound like a simple bus with 2 devices (memory and CPU) but DMA and multi-core chips make this not that simple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With