Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gcc/clang lay out fields of a derived struct in the back-padding of base struct [duplicate]

I'm confused with how gcc and clang lay out structs when both padding and inheritance are involved. Here's a sample program:

#include <string.h>
#include <stdio.h>

struct A
{
    void* m_a;
};

struct B: A
{
    void* m_b1;
    char m_b2;
};

struct B2
{
    void* m_a;
    void* m_b1;
    char m_b2;
};

struct C: B
{
    short m_c;
};

struct C2: B2
{
    short m_c;
};

int main ()
{
    C c;
    memset (&c, 0, sizeof (C));
    memset ((B*) &c, -1, sizeof (B));

    printf (
        "c.m_c = %d; sizeof (A) = %d sizeof (B) = %d sizeof (C) = %d\n", 
        c.m_c, sizeof (A), sizeof (B), sizeof (C)
        );

    C2 c2;
    memset (&c2, 0, sizeof (C2));
    memset ((B2*) &c2, -1, sizeof (B2));

    printf (
        "c2.m_c = %d; sizeof (A) = %d sizeof (B2) = %d sizeof (C2) = %d\n", 
        c2.m_c, sizeof (A), sizeof (B2), sizeof (C2)
        );

    return 0;
}

Output:

$ ./a.out
c.m_c = -1; sizeof (A) = 8 sizeof (B) = 24 sizeof (C) = 24
c2.m_c = 0; sizeof (A) = 8 sizeof (B2) = 24 sizeof (C2) = 32

Structs C1 and C2 are laid out differently. In C1 m_c is allocated in the back-padding of struct B1 and is therefore overwritten by the 2nd memset (); with C2 it doesn't happen.

Compilers used:

$ clang --version
Ubuntu clang version 3.3-16ubuntu1 (branches/release_33) (based on LLVM 3.3)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ c++ --version
c++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The same happens with -m32 option (sizes in the output will be different, obviously).

Both x86 and x86_64 versions of Microsoft Visual Studio 2010 C++ compiler don't have this issue (i.e. they lay out structs С1 and C2 identically)

If it's not a bug and is by design, then my questions are:

  1. what are the precise rules for allocating or not allocating fields of a derived struct in the back-padding (e.g. why it doesn't happen with C2?)
  2. is there any way to override this behaviour with switches/attributes (i.e. lay out just like MSVC does)?

Thanks in advance.

Vladimir

like image 760
user3922059 Avatar asked Aug 10 '14 07:08

user3922059


3 Answers

For everyone downvoting this question and OP's self-answer with self-righteous indignation over how terribly UB his hand-written memcpy was... consider that the implementors of both libc++ and libstdc++ fall into the exact same pit. For the foreseeable future it is actually really important to understand when tail-padding is reused and when it's not. Good on OP for bringing up this issue.

The Itanium ABI rules for struct layout are here. The relevant wording is

If D is a base class, update sizeof(C) to max (sizeof(C), offset(D)+nvsize(D)).

Here "the dsize, nvsize, and nvalign of [POD types] are defined to be their ordinary size and alignment," but the nvsize of a non-POD type is defined to be "the non-virtual size of an object, which is the size of O without virtual bases [and also without tail padding]." So if D is POD, we never nestle anything into its tail padding; whereas if D is not POD, we are allowed to nestle the next member (or base) into its tail padding.

Therefore, any non-POD type (even a trivially copyable one!) must consider the possibility that it has important data stuffed into its tail padding. This generally violates implementors' assumptions about what's permissible to do with trivially copyable types (namely, that you can trivially copy them).

Wandbox test case:

#include <algorithm>
#include <stdio.h>

struct A {
    int m_a;
};

struct B : A {
    int m_b1;
    char m_b2;
};

struct C : B {
    short m_c;
};

int main() {
    C c1 { 1, 2, 3, 4 };
    B& b1 = c1;
    B b2 { 5, 6, 7 };

    printf("before operator=: %d\n", int(c1.m_c));  // 4
    b1 = b2;
    printf("after operator=: %d\n", int(c1.m_c));  // 4

    printf("before std::copy: %d\n", int(c1.m_c));  // 4
    std::copy(&b2, &b2 + 1, &b1);
    printf("after std::copy: %d\n", int(c1.m_c));  // 64, or 0, or anything but 4
}
like image 127
Quuxplusone Avatar answered Oct 26 '22 22:10

Quuxplusone


Your code exhibits undefined behaviour, as C and C2 are not PODs and memcpying over random bits of their data is not allowed.

However, in the slightly longer run, this is a complex issue. The existing C ABI on the platform (Unix) permitted this behaviour (this is for C++98, which permitted it). Then the Committee changed the rules incompatibly in C++03 and C++11. Clang, at least, has a switch to change to the newer rules. The C ABI on Unix, of course, did not change to accomodate the new C++11 rules for putting things in padding, so the compilers can't exactly just update, as that would break all ABI.

I believe that GCC is storing up ABI-breaking changes for 5.0 and this may be one of them.

Windows always banned this practice in their C ABI and therefore do not have a problem, as far as I'm aware.

like image 27
Puppy Avatar answered Oct 27 '22 00:10

Puppy


The difference is that the compiler is allowed to use the padding of a previous object if that object is already "not just data" and manipulating it say with memcpy is not supported.

The B structure is not just data, because it's a derived object and therefore the slack space of it can be used because if you're memcpy-ing a B instance around you're already violating the contract.

B2 instead is just a structure and backward compatibility requires that its size (including the slack space) is just memory your code is allowed to play with using memcpy.

like image 43
6502 Avatar answered Oct 27 '22 00:10

6502