Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How come INC instruction of x86 is not atomic? [duplicate]

I've read that INC instruction of x86 is not atomic. My question is how come? Suppose we are incrementing a 64 bit integer on x86-64, we can do it with one instruction, since INC instruction works with both memory variables and register. So how come its not atomic?

like image 574
pythonic Avatar asked Apr 11 '12 16:04

pythonic


People also ask

Are x86 instructions Atomic?

x86 guarantees that aligned loads and stores up to 64 bits are atomic, but not wider accesses.

Is increment operation an atomic?

The increment-memory machine instruction on an X86 is atomic only if you use it with a LOCK prefix. x++ in C and C++ doesn't have atomic behavior.

Are all assembly instructions Atomic?

No you cannot assume this. Unless it clearly stated in compiler specification. And moreover no one can guarantee that one single assembler instruction indeed atomic. In practice each assembler instruction is translated to number of microcode operation - uops.

What does it mean for a machine instruction to be atomic?

From the Greek meaning "not divisible into smaller parts" An "atomic" operation is always observed to be done or not done, but never halfway done. An atomic operation must be performed entirely or not performed at all.


3 Answers

Why would it be? The processor core still needs to read the value stored at the memory location, calculate the increment of it, and then store it back. There's a latency between reading and storing, and in the mean time another operation could have affected that memory location.

Even with out-of-order execution, processor cores are 'smart' enough not to trip over their own instructions and wouldn't be responsible for modifying this memory in the time gap. However, another core could have issued an instruction that modifies that location, a DMA transfer could have affected that location, or other hardware touched that memory location somehow.

like image 99
Kaganar Avatar answered Sep 19 '22 02:09

Kaganar


Modern x86 processors as part of their execution pipeline "compile" x86 instructions into a lower-level set of operations; Intel calls these uOps, AMD rOps, but what it boils down to is that certain type of single x86 instructions get executed by the actual functional units in the CPU as several steps.
That means, for example, that:

INC EAX 

gets executed as a single "mini-op" like uOp.inc eax (let me call it that - they're not exposed).
For other operands things will look differently, like:

INC DWORD PTR [ EAX ] 

the low-level decomposition though would look more like:

uOp.load tmp_reg, [ EAX ] uOp.inc tmp_reg uOp.store [ EAX ], tmp_reg 

and therefore is not executed atomically. If on the other hand you prefix by saying LOCK INC [ EAX ], that'll tell the "compile" stage of the pipeline to decompose in a different way in order to ensure the atomicity requirement is met.

The reason for this is of course as mentioned by others - speed; why make something atomic and necessarily slower if not always required ?

like image 22
FrankH. Avatar answered Sep 17 '22 02:09

FrankH.


You really don't want a guaranteed atomic operation unless you need it, from Agner Fog's Software optimization resources: instruction_tables.pdf (1996 – 2017):

Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand.

like image 38
Brett Hale Avatar answered Sep 17 '22 02:09

Brett Hale