I'm diving into multi-threaded programming and thinking about lock-free reference counting using atomic operations.
It's obvious, that atomic operation could be slower than non-atomic operations at least on constant scale. My worries are about other CPU synchronizations to perform atomic operations.
I wonder whether (if, and how much) execution of atomic operation on core A affects performance of other cores which:
I'm comparing an atomic read-modify-write operation to the corresponding non-atomic operation on modern x86 CPUs.
have nothing related to core A
No effect.
are executing different threads of same process as core A
No effect.
are executing atomic operation
No effect.
are executing atomic operation and are executing different threads of same process as core A
No effect.
are executing any memory related operation, ie. load, store,...
No effect.
are executing any memory related operation in same memory region (cache line, page?) as core A
The cache line has to be exclusively acquired by the core performing the atomic operation (stealing it from any other core(s) that have it in their caches) and cannot be accessed by another core until the atomic operation is completed to cache and inter-cache traffic synchronizes it so that it's either shared or exclusive in the other core.
The main cost of atomic operations is to the pipelines of the core executing the atomic instruction. Because the atomic operation must take place all at once at a well-defined place, it (mostly) cannot overlap other operations. That's a huge penalty for a superscalar CPU that gains performance by keeping lots of instructions in various stages of processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With