x86_64 registers rax/eax/ax/al overwriting full register contents [duplicate]

Tags:

As it is widely advertised, modern x86_64 processors have 64-bit registers that can be used in backward-compatible fashion as 32-bit registers, 16-bit registers and even 8-bit registers, for example:

0x1122334455667788   ================ rax (64 bits)           ======== eax (32 bits)               ====  ax (16 bits)               ==    ah (8 bits)                 ==  al (8 bits)

Such a scheme may be taken literally, i.e. one can always access only the part of the register using a designated name for reading or writing purposes, and it would be highly logical. In fact, this is true for everything up to 32-bit:

mov  eax, 0x11112222 ; eax = 0x11112222 mov  ax, 0x3333      ; eax = 0x11113333 (works, only low 16 bits changed) mov  al, 0x44        ; eax = 0x11113344 (works, only low 8 bits changed) mov  ah, 0x55        ; eax = 0x11115544 (works, only high 8 bits changed) xor  ah, ah          ; eax = 0x11110044 (works, only high 8 bits cleared) mov  eax, 0x11112222 ; eax = 0x11112222 xor  al, al          ; eax = 0x11112200 (works, only low 8 bits cleared) mov  eax, 0x11112222 ; eax = 0x11112222 xor  ax, ax          ; eax = 0x11110000 (works, only low 16 bits cleared)

However, things seem to be fairly awkward as soon as we get to 64-bit stuff:

mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444 mov  eax, 0x55556666         ; actual:   rax = 0x0000000055556666                              ; expected: rax = 0x1111222255556666                              ; upper 32 bits seem to be lost! mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444 mov  ax, 0x7777              ;           rax = 0x1111222233337777 (works!) mov  rax, 0x1111222233334444 ;           rax = 0x1111222233334444 xor  eax, eax                ; actual:   rax = 0x0000000000000000                              ; expected: rax = 0x1111222200000000                              ; again, it wiped whole register

Such behavior seems to be highly ridiculous and illogical to me. It looks like trying to write anything at all to eax by any means leads to wiping of high 32 bits of rax register.

So, I have 2 questions:

I believe that this awkward behavior must be documented somewhere, but I can't seem to find detailed explanation (of how exactly high 32 bits of 64-bit register get wiped) anywhere. Am I right that writing to eax always wipes rax, or it's something more complicated? Does it apply to all 64-bit registers, or there are some exceptions?

A strongly related question mentions the same behavior, but, alas, there are again no exact references to documentation.

In other words, I'd like a link to documentation that specifies this behavior.
Is it just me or this whole thing seems to be really weird and illogical (i.e. eax-ax-ah-al, rax-ax-ah-al having one behavior and rax-eax having another)? May be I'm missing some kind of vital point here on why was it implemented like that?

An explanation on "why" would be highly appreciated.

512

asked Aug 22 '14 20:08

GreyCat

1 Answers

The processor model as documented in the Intel/AMD processor manual is a pretty imperfect model for the real execution engine of a modern core. In particular, the notion of the processor registers does not match reality, there is no such thing as a EAX or RAX register.

One primary job of the instruction decoder is to convert the legacy x86/x64 instructions into micro-ops, instructions of a RISC-like processor. Small instructions that are easy to execute concurrently and being able to take advantage of multiple execution sub-units. Allowing as many as 6 instructions to execute at the same time.

To make that work, the notion of processor registers is virtualized as well. The instruction decoder allocates a register from a big bank of registers. When the instruction is retired, the value of that dynamically allocated register is written back to whatever register currently holds the value of, say, RAX.

To make that work smoothly and efficiently, allowing many instructions to execute concurrently, it is very important that these operations don't have an interdependency. And the worst kind you can have is that the register value depends on other instructions. The EFLAGS register is notorious, many instructions modify it.

Same problem with the way you like it to work. Big problem, it requires two register values to be merged when the instruction is retired. Creating a data dependency that's going to clog up the core. By forcing the upper 32-bit to 0, that dependency instantly disappears, no longer a need to merge. Warp 9 execution speed.

154

answered Oct 13 '22 06:10

Hans Passant

Related questions
                            
                                Is there a way to insert assembly code into C?
                            
                                What does the "lock" instruction mean in x86 assembly?
                            
                                int operators != and == when comparing to zero
                            
                                Why is this C++ program so incredibly fast?
                            
                                Which variable size to use (db, dw, dd) with x86 assembly?
                            
                                Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?
                            
                                How can objdump emit intel syntax
                            
                                How should I get started on writing device drivers? [closed]
                            
                                What is Intel microcode?
                            
                                "Assembly" vs. "Assembler"
                            
                                How do I compile the asm generated by GCC?
                            
                                Enhanced REP MOVSB for memcpy
                            
                                Why does adding inline assembly comments cause such radical change in GCC's generated code?
                            
                                Go isn't linking my assembly: undefined external function
                            
                                What's the purpose of the CIL nop opcode?
                            
                                C code loop performance [continued]
                            
                                What are SP (stack) and LR in ARM?
                            
                                What are IN & OUT instructions in x86 used for?
                            
                                What does @plt mean here?
                            
                                Why does GCC pad functions with NOPs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

x86_64 registers rax/eax/ax/al overwriting full register contents [duplicate]

Tags:

assembly

x86-64

cpu-registers

zero-extension

GreyCat

People also ask

1 Answers

Hans Passant

Recent Activity

Donate For Us