Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where are string constants stored by GCC and from where these pointers are mapped?

When I compile and run following C program on my Linux x86_64 machine, compiled by GCC :

#include <stdio.h>

int main(void)
{
    char *p1 = "hello";               // Pointers to strings
    char *p2 = "hello";               // Pointers to strings
    if (p1 == p2) {                   // They are equal
    printf("equal %p %p\n", p1, p2);  // equal 0x40064c 0x40064c
                                      // This is always the output on my machine
    }
    else {
    printf("NotEqual %p %p\n", p1, p2);
    }
}

I always get the output as:

equal 0x40064c 0x40064c

I understand that strings are stored in a constant table but address are too low when compared to dynamically allocated memory.

Compare with following program:

#include <stdio.h>

int main(void)
{
    char p1[] = "hello";                // char arrar
    char p2[] = "hello";                // char array
    if (p1 == p2) {
    printf("equal %p %p\n", p1, p2);
    }
    else {                              // Never equal
    printf("NotEqual %p %p\n", p1, p2); // NotEqual 0x7fff4b25f720 0x7fff4b25f710
                                        // Different pointers every time
                                        // Pointer values too large
    }
}

The two pointers are not equal, because these are two arrays which can be independently manipulated.

I want to know how GCC generates the code for these two programs and how are they mapped to memory during execution. Since this would be already documented do so many times any links to documentation are welcome as well.

like image 538
Xolve Avatar asked Sep 12 '12 18:09

Xolve


1 Answers

In both cases the compiler emits the actual bytes of the string "hello" just once, in the .rodata section of the program (rodata stands for read only data).

They are actually mapped directly from the executable file into memory, somewhat similar to the code section. That's why they are far apart from the dynamically allocated ones.

Then:

char *p = "hello";

Simply initializes p to the address of this (read-only) data. And obviously:

char *q = "hello";

Gets the very same address. This is called string pooling and is an optional popular optimization of the compiler.

But when you write:

char p[] = "hello";

It will probably generate something like this:

char p[6];
memcpy(p, "hello", 6);

Being the "hello" actually the address of the read-only pooled string.

The call to memcpy is for illustration purposes only. It may very well to the copy inline, instead than with a function call.

If later you do:

char q[] = "hello";

It will define another array and another memcpy(). So same data, but different addresses.

But where these array variables will reside? Well, that depends.

  • If they are local, non static, variables: in the stack.
  • If they are global variables: then they will be in the .data section of the executable, and they will be saved there with the correct characters already in there, so no memcpy is needed in run time. Which is nice, because that memcpy would have to be executed before main.
  • If they are local static variables: exactly the same than with global variables. They both together are called variables of static duration or something like that.

About the documentation links, sorry, I don't know of any.

But who needs documentation if you can do the experiments yourself? For that the best tool around is objdump, it can disassemble the program, dump the data sections and much more!

I hope this answer your questions...

like image 70
rodrigo Avatar answered Oct 20 '22 09:10

rodrigo