Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are these two pointers equal? Seeking clarification.

Tags:

c

I have a simple program:

int main() {
    char *c = "message";
    char *z = "message";

    if ( c == z )
        printf("Equal!\n");
    else
        printf("Not equal!\n");
    return 0;
}

I wanted to know why this prints Equal!, even when compiled with optimisations turned off (-O0). This would indicate that both c and z point to the same area of memory, and thus the first mutation of z (for example, changing z[0] to a) will be expensive (requiring a copy-and-write).

My understanding of what's happening is that I'm not declaring an array of type char, but rather am creating a pointer to the first character of a string literal. Thus, c and z are both stored in the data segment, not on the stack (and because they're both pointing to the same string literal, c == z is true).

This is different to writing:

char c[] = "message";
char z[] = "message";

if ( c == z ) printf("Equal\n");
else printf("Not equal!\n");

which prints Not equal!, because c and z are both stored in mutable sections of memory (ie, the stack), and are separately stored so that a mutation of one doesn't effect the other.

My question is, is the behaviour I'm seeing (c == z as true) defined behaviour? It seems surprising that the char *c is stored in the data-segment, despite not being declared as const.

Is the behaviour when I try to mutate char *z defined? Why, if char *c = "message" is put in the data segment and is thus read-only, do I get bus error rather than a compiler error? For example, if I do this:

char *c = "message";
c[0] = 'a';

I get:

zsh: bus error  ./a.out

although it compiles happily.

Any further clarification of what's happening here and why would be appreciated.

like image 443
simont Avatar asked May 31 '13 01:05

simont


3 Answers

"the first mutation of z (for example, changing z[0] to a) will be expensive (requiring a copy-and-write)."

Not "expensive"; try undefined. string literals are constants.

like image 67
Elazar Avatar answered Nov 09 '22 05:11

Elazar


The C 2011 Standard. Section 6.4.5. String Literals. Paragraph 7

It is unspecified whether these arrays [string literals] are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

This means that if two string literals have the same value, a compiler is allowed to have them point to the same location in memory, or different, but it is simply a choice the compiler can make.

like image 26
Bill Lynch Avatar answered Nov 09 '22 06:11

Bill Lynch


One of the steps of the C compiler is to find a set of all the string constants in the code. It will only store one copy of any immutable string, even if the string exists in the code twice. So in your example, you have "message" twice -- the compiler will store m e s s a g e \0 in the file (in the read-only data section), and then will initialize both of those pointers to point at the string.

Try making the strings different, and the pointers should now be different too.

Also, try this: print your executable (if it's call a.out, run cat -v a.out). You will see the strings "message", "Equal!" and "Not equal!" sitting in the executable.

Update: (deleted because it was wrong.) Here's the code generated:

8048449:       c7 44 24 1c 6d 65 73 73   movl   $0x7373656d,0x1c(%esp)
8048451:       c7 44 24 20 61 67 65 00   movl   $0x656761,0x20(%esp)
8048459:       c7 44 24 24 6d 65 73 73   movl   $0x7373656d,0x24(%esp)
8048461:       c7 44 24 28 61 67 65 00   movl   $0x656761,0x28(%esp)
8048469:       b8 70 85 04 08            mov    $0x8048570,%eax

It's creating this hex string twice (my machine is little-endian):

6d 65 73 73 61 67 65 00
m  e  s  s  a  g  e  \0

so you're right -- it is putting the string in mutable memory.

like image 38
Robert Martin Avatar answered Nov 09 '22 06:11

Robert Martin