Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What constitutes as padding in a union?

I'm trying to interpret the C11 standard regarding static (and thread-local) initialisation of a union when not explicitly initialised.

Section 6.7.9 10 (pg 139) states the following:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:

— if it has pointer type, it is initialized to a null pointer;

— if it has arithmetic type, it is initialized to (positive or unsigned) zero;

— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

Supposing we're on an amd64 architecture, given the following statement:

static union { uint32_t x; uint16_t y[3]; } u;

Can u.y[2] contain non-zero values or is it initialised to zero because it is regarded as padding?

I've scoured the C11 standard but there is little to no explanation as to what constitutes as padding in a union. In the C99 standard (pg 126) padding isn't mentioned, so in that case u.y[2] can be non-zero.

like image 466
snappy Avatar asked Jan 12 '19 13:01

snappy


1 Answers

The extra space used by y that isn't used by x is not considered padding. Section 6.7.2.1p17 of the C11 standard regarding "Structure and union specifiers" states:

There may be unnamed padding at the end of a structure or union

The bytes used by y in your example that are not used by x are still named, and are therefore not padding.

Your example most likely does have this unnamed padding, since the largest member takes up 6 bytes but one of the members is a uint32_t which typically requires 4 byte alignment. In fact, on gcc 4.8.5 the size of this union is 8 bytes. So the memory layout of this union looks like this:

            -----  --|       ---|
         0  | 0 |    |          |
            -----    |          |-- y[0]
         1  | 0 |    |          |
            -----    |-- x   ---|
         2  | 0 |    |          |            
            -----    |          |-- y[1]
         3  | 0 |    |          |
            -----  --|       ---|
         4  | 0 |               |
            -----               |-- y[2]
         5  | 0 |               |
            -----            ---|
         6  | 0 |  -- padding
            -----
         7  | 0 |  -- padding
            -----

So going by a strict reading of the standard, for a static instance of this union without an explicit initializer:

  • Bytes 0 - 3, corresponding to x (i.e. the first named member), are initialized to 0 resulting in x being 0.
  • Bytes 4 - 5, corresponding to y[2], remain uninitialized and have indeterminate values.
  • Bytes 6 - 7, corresponding to padding, are initialized to 0.

I tested this on gcc 4.8.5, clang 3.3, and MSVC 2015, and all of them set all bytes to 0 under various optimization settings. However, going by a strict reading of the standard the behavior is not guaranteed, so it's still possible that a different optimization setting of these compilers, different versions of them, or different compilers altogether may do something different.

From a pragmatic standpoint, it would make sense for a compiler to simply set all bytes of a static object to 0 to satisfy this requirement. This is assuming of course that no integer types have padding, floating point types are IEEE754, and NULL pointers have the numerical value of 0. On most systems that most people are likely to come across, this will be the case. Systems where this is not the case might be more likely to leave these bytes set to something other than 0. So again, while these bytes might be set to 0, there is no guarantee.

An important point to keep in mind is that a union can only store one member at a time as per 6.7.2.1p16:

The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa.

So if a union with static storage duration is uninitialized, it is only safe to access the first member since that is the one which was implicitly initialized.

The only exception to this is if the union contains structures with a common set of initial members, in which case you can access any of the common elements of the inner structs. This is detailed in section 6.5.2.3p6:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

like image 178
dbush Avatar answered Nov 18 '22 11:11

dbush