I've been learning assembly, and I've read that the four main x86 general purpose registers (eax, ebx, ecx, and edx) each had an intended or suggested purpose. For example, eax is the accumulator register, ecx is used as a counter for loops, and so on. Do most compilers attempt to use registers for the suggested purpose, or do they ignore what the registers are "supposed" to be for and just assign values to the next available register?
Also, when looking at the x64 registers, I noticed that an extra eight general purpose registers were added, bringing the total number of gp registers to twelve if you ignore rbp, rsp, rsi, and rdi (since they have non-general purpose uses), and sixteen if you do include them. In normal user programs (i.e. browsers, word processors, etc, and not cryptographic programs that require lots of registers), how many of these registers are normally in use at any given time? Is it common for a program like, say, Firefox to be using all 12/16 normal registers at once, or do they only use a subset since they don't have enough variables to fill them all? I will look into this myself by disassembling binaries to see what the general case is, but I would appreciate an answer from someone more knowledgeable than I.
Also, do compilers normally use semi-gp registers (rsi, rdi, rsp, and rbp) for general purpose use if they're not currently being used for their non-general application? I was curious because I saw these registers listed as "general purpose," but even I can think of instances off the top of my head where these registers can't be used for general storage (for example, you wouldn't want to store variables to rbp and rsp and then push values to the stack!). So do compilers try to make use of these registers when they can? Is there a difference between x86 and x64 compilation, since x64 processors have more registers available, so that it isn't necessary to stuff variables into any available register?
All GP registers are general.
They have special meaning only when specific, usually legacy, instructions are executed.
For example of the quadruplet rsi
, rdi
, rbp
, rsp
only the latter has a special purpose, and that's due to instructions like call
, ret
, push
and so on.
If you don't use them, even implicitly (an unlikely situation admittedly), you can use it as an accumulator.
This principle is general and compilers exploit it.
Consider this artificial example[1]:
void maxArray(int* x, int* y, int*z, short* w) {
for (int i = 0; i < 65536; i++)
{
int a = y[i]*z[i];
int b = z[i]*z[i];
int c = y[i]*x[i]-w[i];
int d = w[i]+x[i]-y[i];
int e = y[i+1]*w[i+2];
int f = w[i]*w[i];
x[i] = a*a-b+d;
y[i] = b-c*d/f+e;
z[i] = (e+f)*2-4*a*d;
w[i] = a*b-c*d+e*f;
}
}
It is compiled by GCC into this listing
maxArray(int*, int*, int*, short*):
push r13
push r12
xor r8d, r8d
push rbp
push rbx
mov r12, rdx
.L2:
mov edx, DWORD PTR [rsi+r8*2]
mov ebp, DWORD PTR [r12+r8*2]
movsx r11d, WORD PTR [rcx+r8]
mov eax, DWORD PTR [rdi+r8*2]
movsx ebx, WORD PTR [rcx+4+r8]
mov r9d, edx
mov r13d, edx
imul r9d, ebp
imul r13d, eax
lea r10d, [rax+r11]
imul ebx, DWORD PTR [rsi+4+r8*2]
mov eax, r9d
sub r10d, edx
imul ebp, ebp
sub r13d, r11d
imul eax, r9d
imul r11d, r11d
sub eax, ebp
add eax, r10d
mov DWORD PTR [rdi+r8*2], eax
mov eax, r13d
imul eax, r10d
cdq
idiv r11d
mov edx, ebp
sub edx, eax
mov eax, edx
lea edx, [0+r9*4]
add eax, ebx
mov DWORD PTR [rsi+r8*2], eax
lea eax, [rbx+r11]
imul r9d, ebp
imul r11d, ebx
add eax, eax
imul edx, r10d
add r9d, r11d
imul r10d, r13d
sub eax, edx
sub r9d, r10d
mov DWORD PTR [r12+r8*2], eax
mov WORD PTR [rcx+r8], r9w
add r8, 2
cmp r8, 131072
jne .L2
pop rbx
pop rbp
pop r12
pop r13
ret
You can see that most of the GP registers are used (I haven't counted them), including rbp
, rsi
and rdi
.
None of the registers' uses is limited to their canonical form.
Note In this example rsi
and rdi
are used to load and read (both for each register) an array, that's a coincidence.
Those registers are used to pass the first two integer/pointer arguments.
int sum(int a, int b, int c, int d)
{
return a+b+c+d;
}
sum(int, int, int, int):
lea eax, [rdi+rsi]
add eax, edx
add eax, ecx
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With