Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do compilers usually use registers for their "intended" purpose?

I've been learning assembly, and I've read that the four main x86 general purpose registers (eax, ebx, ecx, and edx) each had an intended or suggested purpose. For example, eax is the accumulator register, ecx is used as a counter for loops, and so on. Do most compilers attempt to use registers for the suggested purpose, or do they ignore what the registers are "supposed" to be for and just assign values to the next available register?

Also, when looking at the x64 registers, I noticed that an extra eight general purpose registers were added, bringing the total number of gp registers to twelve if you ignore rbp, rsp, rsi, and rdi (since they have non-general purpose uses), and sixteen if you do include them. In normal user programs (i.e. browsers, word processors, etc, and not cryptographic programs that require lots of registers), how many of these registers are normally in use at any given time? Is it common for a program like, say, Firefox to be using all 12/16 normal registers at once, or do they only use a subset since they don't have enough variables to fill them all? I will look into this myself by disassembling binaries to see what the general case is, but I would appreciate an answer from someone more knowledgeable than I.

Also, do compilers normally use semi-gp registers (rsi, rdi, rsp, and rbp) for general purpose use if they're not currently being used for their non-general application? I was curious because I saw these registers listed as "general purpose," but even I can think of instances off the top of my head where these registers can't be used for general storage (for example, you wouldn't want to store variables to rbp and rsp and then push values to the stack!). So do compilers try to make use of these registers when they can? Is there a difference between x86 and x64 compilation, since x64 processors have more registers available, so that it isn't necessary to stuff variables into any available register?

like image 794
James Preston Avatar asked Apr 06 '17 16:04

James Preston


1 Answers

All GP registers are general.
They have special meaning only when specific, usually legacy, instructions are executed.

For example of the quadruplet rsi, rdi, rbp, rsp only the latter has a special purpose, and that's due to instructions like call, ret, push and so on.
If you don't use them, even implicitly (an unlikely situation admittedly), you can use it as an accumulator.

This principle is general and compilers exploit it.

Consider this artificial example[1]:

void maxArray(int* x, int* y, int*z, short* w) {
    for (int i = 0; i < 65536; i++)
    {
        int a = y[i]*z[i];
        int b = z[i]*z[i];
        int c = y[i]*x[i]-w[i];
        int d = w[i]+x[i]-y[i];
        int e = y[i+1]*w[i+2];
        int f = w[i]*w[i];

        x[i] = a*a-b+d; 
        y[i] = b-c*d/f+e;
        z[i] = (e+f)*2-4*a*d;
        w[i] = a*b-c*d+e*f;
    }
}

It is compiled by GCC into this listing

maxArray(int*, int*, int*, short*):
        push    r13
        push    r12
        xor     r8d, r8d
        push    rbp
        push    rbx
        mov     r12, rdx
.L2:
        mov     edx, DWORD PTR [rsi+r8*2]    
        mov     ebp, DWORD PTR [r12+r8*2]
        movsx   r11d, WORD PTR [rcx+r8]
        mov     eax, DWORD PTR [rdi+r8*2]
        movsx   ebx, WORD PTR [rcx+4+r8]
        mov     r9d, edx
        mov     r13d, edx
        imul    r9d, ebp
        imul    r13d, eax
        lea     r10d, [rax+r11]
        imul    ebx, DWORD PTR [rsi+4+r8*2]
        mov     eax, r9d
        sub     r10d, edx
        imul    ebp, ebp
        sub     r13d, r11d
        imul    eax, r9d
        imul    r11d, r11d
        sub     eax, ebp
        add     eax, r10d
        mov     DWORD PTR [rdi+r8*2], eax
        mov     eax, r13d
        imul    eax, r10d
        cdq
        idiv    r11d
        mov     edx, ebp
        sub     edx, eax
        mov     eax, edx
        lea     edx, [0+r9*4]
        add     eax, ebx
        mov     DWORD PTR [rsi+r8*2], eax
        lea     eax, [rbx+r11]
        imul    r9d, ebp
        imul    r11d, ebx
        add     eax, eax
        imul    edx, r10d
        add     r9d, r11d
        imul    r10d, r13d
        sub     eax, edx
        sub     r9d, r10d
        mov     DWORD PTR [r12+r8*2], eax
        mov     WORD PTR [rcx+r8], r9w
        add     r8, 2
        cmp     r8, 131072
        jne     .L2
        pop     rbx
        pop     rbp
        pop     r12
        pop     r13
        ret

You can see that most of the GP registers are used (I haven't counted them), including rbp, rsi and rdi.
None of the registers' uses is limited to their canonical form.

Note In this example rsi and rdi are used to load and read (both for each register) an array, that's a coincidence.
Those registers are used to pass the first two integer/pointer arguments.

int sum(int a, int b, int c, int d)
{
    return a+b+c+d;
}

sum(int, int, int, int):
        lea     eax, [rdi+rsi]
        add     eax, edx
        add     eax, ecx
        ret
like image 121
Margaret Bloom Avatar answered Sep 24 '22 20:09

Margaret Bloom