Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC vs Clang copying struct flexible array member

Tags:

c

gcc

clang

Consider the following code snippet.

#include <stdio.h>

typedef struct s {
    int _;
    char str[];
} s;
s first = { 0, "abcd" };

int main(int argc, const char **argv) {
    s second = first;
    printf("%s\n%s\n", first.str, second.str);
}

When I compile this with GCC 7.2, I get:

$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd

But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following:

$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here

Why does the output differ between the compilers? I would expect the string not to be copied, as it's a flexible array member (similar to this question). Why does GCC actually copy it?

Edit

Some comments and an answer suggested this might be due to optimization. GCC may make second an alias of first, so updating second should disallow GCC from doing that optimization. I added the line:

second._ = 1;

But this doesn't change the output.

like image 264
bnaecker Avatar asked Oct 18 '17 04:10

bnaecker


1 Answers

Here's the real answer of what's going on with gcc. second is allocated on the stack, just as you'd expect. It is not an alias for first. This is easily verified by printing their addresses.

Additionally, the declaration s second = first; is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second but (b) it is copying all of first into second, corrupting the stack.

Here is a modified version of the original code which shows this:

#include <stdio.h>

typedef struct s {
    int _;
    char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
    char v[] = "xxxxxxxx";
    s second = first;
    printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
    printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}

On my 32-bit Linux machine, with gcc, I get the following output:

0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>

As you can see from the addresses, v and second are on the stack, and first is in the data section. Further, it is also clear that the initialization of second has overwritten v on the stack, with the result that instead of the expected <xxxxxxxx>, it is instead showing <defgh>.

This seems like a gcc bug to me. At the very least, it should warn that the initialization of second will corrupt the stack, since it clearly has enough information to know this at compile time.

Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second into:

s second;
second = first;

The real problem is the assignment. It's copying all of first, rather than the minimal common part of the structure type, which is what I believe it should do. In fact, if you move the static initialization of first into a separate file, the assignment does what it should do, v prints correctly, and second.str is undefined garbage. This is the behavior gcc should be producing, regardless of whether the initialization of first is visible in the same compilation unit or not.

like image 138
Tom Karzes Avatar answered Nov 08 '22 04:11

Tom Karzes