can someone explain how does the xchg work in this code? Given that arrayD is an DWORD array of 1,2,3. <pre class="prettyprint"><code>mov eax, arrayD ; eax=1 xchg eax, [arrayD+4]; eax=2 arrayD=2,1,3 </code></pre> Why isn't the array 1,1,3 after the xchg?

<code>xchg</code> works like Intel's documentation says. I think the comment on the 2nd line is wrong. It should be <code>eax=2</code>, <code>arrayD = 1,1,3</code>. So you're correct, and you should email your instructor to say you think you've found a mistake, unless you missed something in your notes. <code>xchg</code> only stores one element, and it can't magically look back in time to know where the value in eax came from and swap two memory locations with one <code>xchg</code> instruction. The only way to swap <code>1,2</code> to <code>2,1</code> in one instruction would be a 64-bit rotate, like <code>rol qword ptr [arrayD], 32</code> (x86-64 only). <hr> BTW, don't use <code>xchg</code> with a memory operand if you care about performance. It has an implicit <code>lock</code> prefix, so it's a full memory barrier and takes about 20 CPU cycles on Haswell/Skylake (http://agner.org/optimize/). Of course, multiple instructions can be in flight at once, but <code>xchg mem,reg</code> is 8 uops, vs. 2 total for separate load + store. <code>xchg</code> doesn't stall the pipeline, but the memory barrier hurts a lot, as well as it just being a lot of work for the CPU to do to make it atomic. Related: <ul> <li> swapping 2 registers in 8086 assembly language(16 bits) (how to efficiently swap a register with memory). <code>xchg</code> is only useful for this case if you need atomicity, or if you care about code-size but not speed.</li> <li>Can num++ be atomic for 'int num'?</li> <li> Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? (for the reg,reg version)</li> </ul>

How does xchg work in Intel Assembly Language

Tags:

x86

assembly

can someone explain how does the xchg work in this code? Given that arrayD is an DWORD array of 1,2,3.

mov eax, arrayD ; eax=1
xchg eax, [arrayD+4]; eax=2 arrayD=2,1,3

Why isn't the array 1,1,3 after the xchg?

381

asked Apr 30 '18 14:04

Alloysius Goh

1 Answers

xchg works like Intel's documentation says.

I think the comment on the 2nd line is wrong. It should be eax=2, arrayD = 1,1,3. So you're correct, and you should email your instructor to say you think you've found a mistake, unless you missed something in your notes.

xchg only stores one element, and it can't magically look back in time to know where the value in eax came from and swap two memory locations with one xchg instruction.

The only way to swap 1,2 to 2,1 in one instruction would be a 64-bit rotate, like rol qword ptr [arrayD], 32 (x86-64 only).

BTW, don't use xchg with a memory operand if you care about performance. It has an implicit lock prefix, so it's a full memory barrier and takes about 20 CPU cycles on Haswell/Skylake (http://agner.org/optimize/). Of course, multiple instructions can be in flight at once, but xchg mem,reg is 8 uops, vs. 2 total for separate load + store. xchg doesn't stall the pipeline, but the memory barrier hurts a lot, as well as it just being a lot of work for the CPU to do to make it atomic.

swapping 2 registers in 8086 assembly language(16 bits) (how to efficiently swap a register with memory). xchg is only useful for this case if you need atomicity, or if you care about code-size but not speed.
Can num++ be atomic for 'int num'?
Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? (for the reg,reg version)

179

answered Nov 15 '22 09:11

Peter Cordes

Related questions
                            
                                Need an explanation of a particular security optimisation
                            
                                How can I obtain the address of internal System.pas functions?
                            
                                Understanding this part arm assembly code
                            
                                g++ -no optimization- skips asm code after goto
                            
                                why for loop has 1 extra instruction than expected?
                            
                                what is the use of ori in this part of MIPS code?
                            
                                Editing ELF binary call instruction
                            
                                Why aren't the higher 16-bits in EAX accessible by name (like AX, AH and AL)? [duplicate]
                            
                                Assembly - x86 call instruction and memory address?
                            
                                Creating a simple multiboot kernel loaded with grub2
                            
                                All asm labels becoming symbols in executable file
                            
                                Why does the machine code depend on the OS type? [closed]
                            
                                X86 64-bits Assembly Linux 'Hello World' linking issue
                            
                                Executable Section Headers - Meaning and use?
                            
                                conditional jumps -- comparing c code to assembly
                            
                                Can I pop from the middle of a stack?
                            
                                Does a zero change jump on x86 clear the instruction prefetch queue?
                            
                                When should I use size directives in x86?
                            
                                Why is there two sequential move to EAX under optimization build?
                            
                                how to get address of variable and dereference it in nasm x86 assembly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With