The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, bitfield and word sharing, or type punning. Unions can also provide low-level polymorphism.
A union is a user-defined type similar to structs in C except for one key difference. Structures allocate enough space to store all their members, whereas unions can only hold one member value at a time.
The purpose of unions is rather obvious, but for some reason people miss it quite often.
The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.
It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.
That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.
For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavior is described as producing implementation-defined behavior in C89/90.
EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.
You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:
struct VAROBJECT
{
enum o_t { Int, Double, String } objectType;
union
{
int intValue;
double dblValue;
char *strValue;
} value;
} object;
The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.
If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_t
and sets the fields appropriately through bitmask operations. The same can be done in C with a function.
Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.
The most common use of union
I regularly come across is aliasing.
Consider the following:
union Vector3f
{
struct{ float x,y,z ; } ;
float elts[3];
}
What does this do? It allows clean, neat access of a Vector3f vec;
's members by either name:
vec.x=vec.y=vec.z=1.f ;
or by integer access into the array
for( int i = 0 ; i < 3 ; i++ )
vec.elts[i]=1.f;
In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.
As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.
union A {
int i;
double d;
};
A a[10]; // records in "a" can be either ints or doubles
a[0].i = 42;
a[1].d = 1.23;
Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With