I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be?
As an example, with this simple hello world program:
section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg
section .text
global start
start:
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to standard out = 1
mov rsi, msg ; The address of hello_world string
mov rdx, len ; The size to write
syscall ; Invoke the kernel
mov rax, 0x2000001 ; System call number for exit = 1
mov rdi, 0 ; Exit success = 0
syscall ; Invoke the kernel
rdi and rdx, at least, only need 8 bits and not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program assembles and links but doesn't output anything.
However, it still works if I use eax, edi and edx, so should I use those rather than the full 64-bits? Why or why not?
A 64-bit register can theoretically reference 18,446,744,073,709,551,616 bytes, or 17,179,869,184 GB (16 exabytes) of memory. This is several million times more than an average workstation would need to access.
x64 extends x86's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with "r", so for example the 64-bit extension of eax is called rax. The new registers are named r8 through r15.
As well, 64-bit x86 includes SSE2, so each 64-bit x86 CPU has at least 8 registers (named XMM0–XMM7) that are 128 bits wide, but only accessible through SSE instructions.
R just stands for "register". The AMD64 ISA extension added 8 additional general-purpose registers, named R8 through R15 . The 64-bit extended versions of the original 8 registers had an R prefix added to them for symmetry. E stands for "extended" or "enhanced".
You are asking several questions here.
If you just load the low 8 bits of a register, the rest of the register will keep its previous value. That can explain why your system call got the wrong parameters.
One reason for using 32 bits when that is all you need is that many instructions using EAX or EBX are one byte shorter than those using RAX or RBX. It might also mean that constants loaded into the register are shorter.
The instruction set has evolved over a long time and has quite a few quirks!
First and foremost would be when loading a smaller (e.g. 8-bit) value from memory (reading a char, working on a data structure, deserialising a network packet, etc.) into a register.
MOV AL, [0x1234]
versus
MOV RAX, [0x1234]
SHR RAX, 56
# assuming there are actually 8 accessible bytes at 0x1234,
# and they're the right endianness; otherwise you'd need
# AND RAX, 0xFF or similar...
Or, of course, writing said value back to memory.
(Edit, like 6 years later):
Since this keeps coming up:
MOV AL, [0x1234]
By contrast:
MOV RAX, [0x1234]
SHR
instruction years ago)Also important to note:
MOV EAX, [0x1234]
Then, as mentioned in the comments, there is:
MOVZX EAX, byte [0x1234]
In all of these cases, if you want to write from the 'A' register into memory you'd have to pick your width:
MOV [0x1234], AL ; write a byte (8 bits)
MOV [0x1234], AX ; write a word (16 bits)
MOV [0x1234], EAX ; write a dword (32 bits)
MOV [0x1234], RAX ; write a qword (64 bits)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With