Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When would the compiler be conservative regarding pointer dereferencing optimization, if at all?

So, I recently took an interest in how well the compiler (gcc (GCC) 4.8.3 being the one in question) is in optimizing pointers and pointers.

Initially I created a simple integer and an integer pointer and realized operations on it so I could print it out. As expected, all the operations that were hard coded were optmized, through dereferenced pointer or not.

call    __main
leaq    .LC0(%rip), %rcx
movl    $1, %edx
call    printf

And even after creating a function that takes in an int pointer, dereferences it and changes it it still was perfectly optmized.

call    __main
leaq    .LC0(%rip), %rcx
movl    $-1, %edx
call    printf

Now, when I treated my pointer as a void and made changes by casting it to char and dereferencing it, it actually still optmized perfectly (an 'extra' mov call since I initially treated it as an 8 byte value, and then as a 1 byte value for pointer dereferencing)

call    __main
movl    $4, 44(%rsp)
movb    $2, 44(%rsp)
leaq    .LC0(%rip), %rcx
movl    44(%rsp), %eax
leal    1(%rax), %edx
call    printf

So onto my question(s):

  1. How consistent is compiler optimization regarding pointer dereferencing? What would be some cases where it would chose to be conservative?

  2. If all of my pointers in a project were declared with the restrict keyword, could I trust it would be as well optimized as if 'no pointers were being used at all'?

(assuming there are no volatile cases )

Ps¹.: I am aware the compiler generally does a good enough job, and that a programmer worrying about aiding the compiler in minor optimizations is, in general, unproductive (as so many point out in stackoverflow answers to questions regarding optimization). Yet I still have curiosity regarding the matter.

Ps².: gcc -O3 -S -c main.c was the command used to generate the assembly code

C Code: (as requested)

1:

#include <stdio.h>

int main (void)
{
    int a = 4;
    int *ap = &a;

    *ap = 0;
    a += 1;

    printf("%d\n", a);
    return 0;
}

2:

#include <stdio.h>

void change(int *p) {
    *p -= 2;
}

int main (void)
{
    int a = 4;
    int *ap = &a;

    *ap = 0;
    change(ap);
    a += 1;

    printf("%d\n", a);
    return 0;
}

3:

#include <stdio.h>

void change(void *p) {
    *((char*)p) += 2;
}

int main (void)
{
    int a = 4;
    void *ap = (void*) &a;

    *((char*)(ap)) = 0;
    change(ap);
    a += 1;

    printf("%d\n", a);
    return 0;
}
like image 675
SSWilks Avatar asked Jul 20 '15 22:07

SSWilks


People also ask

What happens when you dereference a function pointer?

As opposed to referencing a data value, a function pointer points to executable code within memory. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments just as in a normal function call.

Can we dereference a function pointer?

As shown in the example we can pass functions in another function as an argument using function pointers. This way we can pass reference of function pointer in a function and dereference it later inside the functions body to create a function call.


1 Answers

LLVM and GCC both emit static-single-assigment form code as a part of optimization analysis. One of the useful properties of SSA code is that precisely shows the flow of influence for assignment -- that is, it knows what assignments lead to other assignments and so can detect which values can influence all others.

The first influence chain looks something like

a1 -> constant(0) -> ap -> a2

The second: a1 -> constant(0) - > ap -> p -> a2

The third being pretty similar to the second. (Sorry, this notation is pretty much made-up but i hope it illustrates my point.)

Because it is fairly simple to prove that the influence of a on ap is deterministic, it will feel free to dereference 'early' and combine the instructions into one (though in the first two cases this isn't the most accurate statement since the constant overwrites the original reference and lets the compiler prove that the original assignment does not flow to the end of the code.

Causing the compiler to be more conservative about dereferencing would involve getting complicated enough to escape the compiler's understanding (difficult in a static program i think) or more likely causing the compiler to invoke a phi function in the process of SSA (in laymen's terms, to cause the assignment to be influenced by multiple previous assignments) in a nondeterministic way.

The restrict keyword has the purpose of hinting to the compiler that two pointers are different. This wouldn't restrict the use of dereference at runtime if the code which produced that pointer still had a nondeterministic source (for example, if runtime-created data influenced the choice of what pointer value was dereferenced- i think this could happen if a serialized pointer was sent into the program from an external source?)

like image 105
argentage Avatar answered Oct 02 '22 09:10

argentage