How come INC instruction of x86 is not atomic? [duplicate]

Tags:

I've read that INC instruction of x86 is not atomic. My question is how come? Suppose we are incrementing a 64 bit integer on x86-64, we can do it with one instruction, since INC instruction works with both memory variables and register. So how come its not atomic?

574

asked Apr 11 '12 16:04

pythonic

3 Answers

Why would it be? The processor core still needs to read the value stored at the memory location, calculate the increment of it, and then store it back. There's a latency between reading and storing, and in the mean time another operation could have affected that memory location.

Even with out-of-order execution, processor cores are 'smart' enough not to trip over their own instructions and wouldn't be responsible for modifying this memory in the time gap. However, another core could have issued an instruction that modifies that location, a DMA transfer could have affected that location, or other hardware touched that memory location somehow.

answered Sep 19 '22 02:09

Kaganar

Modern x86 processors as part of their execution pipeline "compile" x86 instructions into a lower-level set of operations; Intel calls these uOps, AMD rOps, but what it boils down to is that certain type of single x86 instructions get executed by the actual functional units in the CPU as several steps.
That means, for example, that:

INC EAX

gets executed as a single "mini-op" like uOp.inc eax (let me call it that - they're not exposed).
For other operands things will look differently, like:

INC DWORD PTR [ EAX ]

the low-level decomposition though would look more like:

uOp.load tmp_reg, [ EAX ] uOp.inc tmp_reg uOp.store [ EAX ], tmp_reg

and therefore is not executed atomically. If on the other hand you prefix by saying LOCK INC [ EAX ], that'll tell the "compile" stage of the pipeline to decompose in a different way in order to ensure the atomicity requirement is met.

The reason for this is of course as mentioned by others - speed; why make something atomic and necessarily slower if not always required ?

answered Sep 17 '22 02:09

FrankH.

You really don't want a guaranteed atomic operation unless you need it, from Agner Fog's Software optimization resources: instruction_tables.pdf (1996 – 2017):

Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand.

answered Sep 17 '22 02:09

Brett Hale

Related questions
                            
                                Conditional move (cmov) for AVX vector registers based on scalar integer condition?
                            
                                The implementation of Linux kernel current macro
                            
                                Differences between x86/x64/ia64 memory models on .NET
                            
                                Performance penalty with executing x86 instructions stored in the data segment?
                            
                                Is ARM a more secure instruction set?
                            
                                Are the Intel compilers worth it?
                            
                                C++ memory alignment question
                            
                                Explain how the AF flag works in an x86 instructions?
                            
                                Intel C++ Compiler understanding what optimization is performed
                            
                                .Net2 assemblies hosted in .Net4 app perform better in x86 than in AnyCpu mode?
                            
                                x87 FPOP and FCOM instructions - how do these work?
                            
                                Mapped memory and SSE
                            
                                How to do source level debugging of x86 code with GDB inside QEMU?
                            
                                assembly subroutines get called twice without even being called from main
                            
                                How to do an atomic increment and fetch in C?
                            
                                How does assembly do parameter passing: by value, reference, pointer for different types/arrays?
                            
                                Is it worth bothering to align AVX-256 memory stores?
                            
                                How encode a relative short jmp in x86
                            
                                How to count clock cycles with RDTSC in GCC x86? [duplicate]
                            
                                pop Instruction not supported in 64-bit mode using NASM?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How come INC instruction of x86 is not atomic? [duplicate]

Tags:

c

assembly

atomic

x86-64

intel