Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When would anyone use a union? Is it a remnant from the C-only days?

Tags:

c++

c

unions

People also ask

When should I use union?

The SQL UNION operator is used to combine the result sets of 2 or more SELECT statements. It removes duplicate rows between the various SELECT statements. Each SELECT statement within the UNION must have the same number of fields in the result sets with similar data types.

Where do we use unions in C?

C unions are used to save memory. To better understand an union, think of it as a chunk of memory that is used to store variables of different types. When we want to assign a new value to a field, then the existing data is replaced with new data.

Why do we use unions in C?

A union is a special data type available in C that allows to store different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Unions provide an efficient way of using the same memory location for multiple-purpose.

When to use c++ unions?

The union holds in the first iteration a double and in the second iteration an int value. If you read a double as an int (1) or an int as a double (2), you get undefined behaviour. To overcome this source of errors, you should use a tagged union.


Unions are usually used with the company of a discriminator: a variable indicating which of the fields of the union is valid. For example, let's say you want to create your own Variant type:

struct my_variant_t {
    int type;
    union {
        char char_value;
        short short_value;
        int int_value;
        long long_value;
        float float_value;
        double double_value;
        void* ptr_value;
    };
};

Then you would use it such as:

/* construct a new float variant instance */
void init_float(struct my_variant_t* v, float initial_value) {
    v->type = VAR_FLOAT;
    v->float_value = initial_value;
}

/* Increments the value of the variant by the given int */
void inc_variant_by_int(struct my_variant_t* v, int n) {
    switch (v->type) {
    case VAR_FLOAT:
        v->float_value += n;
        break;

    case VAR_INT:
        v->int_value += n;
        break;
    ...
    }
}

This is actually a pretty common idiom, specially on Visual Basic internals.

For a real example see SDL's SDL_Event union. (actual source code here). There is a type field at the top of the union, and the same field is repeated on every SDL_*Event struct. Then, to handle the correct event you need to check the value of the type field.

The benefits are simple: there is one single data type to handle all event types without using unnecessary memory.


I find C++ unions pretty cool. It seems that people usually only think of the use case where one wants to change the value of a union instance "in place" (which, it seems, serves only to save memory or perform doubtful conversions).

In fact, unions can be of great power as a software engineering tool, even when you never change the value of any union instance.

Use case 1: the chameleon

With unions, you can regroup a number of arbitrary classes under one denomination, which isn't without similarities with the case of a base class and its derived classes. What changes, however, is what you can and can't do with a given union instance:

struct Batman;
struct BaseballBat;

union Bat
{
    Batman brucewayne;
    BaseballBat club;
};

ReturnType1 f(void)
{
    BaseballBat bb = {/* */};
    Bat b;
    b.club = bb;
    // do something with b.club
}

ReturnType2 g(Bat& b)
{
    // do something with b, but how do we know what's inside?
}

Bat returnsBat(void);
ReturnType3 h(void)
{
    Bat b = returnsBat();
    // do something with b, but how do we know what's inside?
}

It appears that the programmer has to be certain of the type of the content of a given union instance when he wants to use it. It is the case in function f above. However, if a function were to receive a union instance as a passed argument, as is the case with g above, then it wouldn't know what to do with it. The same applies to functions returning a union instance, see h: how does the caller know what's inside?

If a union instance never gets passed as an argument or as a return value, then it's bound to have a very monotonous life, with spikes of excitement when the programmer chooses to change its content:

Batman bm = {/* */};
Baseball bb = {/* */};
Bat b;
b.brucewayne = bm;
// stuff
b.club = bb;

And that's the most (un)popular use case of unions. Another use case is when a union instance comes along with something that tells you its type.

Use case 2: "Nice to meet you, I'm object, from Class"

Suppose a programmer elected to always pair up a union instance with a type descriptor (I'll leave it to the reader's discretion to imagine an implementation for one such object). This defeats the purpose of the union itself if what the programmer wants is to save memory and that the size of the type descriptor is not negligible with respect to that of the union. But let's suppose that it's crucial that the union instance could be passed as an argument or as a return value with the callee or caller not knowing what's inside.

Then the programmer has to write a switch control flow statement to tell Bruce Wayne apart from a wooden stick, or something equivalent. It's not too bad when there are only two types of contents in the union but obviously, the union doesn't scale anymore.

Use case 3:

As the authors of a recommendation for the ISO C++ Standard put it back in 2008,

Many important problem domains require either large numbers of objects or limited memory resources. In these situations conserving space is very important, and a union is often a perfect way to do that. In fact, a common use case is the situation where a union never changes its active member during its lifetime. It can be constructed, copied, and destructed as if it were a struct containing only one member. A typical application of this would be to create a heterogeneous collection of unrelated types which are not dynamically allocated (perhaps they are in-place constructed in a map, or members of an array).

And now, an example, with a UML class diagram:

many compositions for class A

The situation in plain English: an object of class A can have objects of any class among B1, ..., Bn, and at most one of each type, with n being a pretty big number, say at least 10.

We don't want to add fields (data members) to A like so:

private:
    B1 b1;
    .
    .
    .
    Bn bn;

because n might vary (we might want to add Bx classes to the mix), and because this would cause a mess with constructors and because A objects would take up a lot of space.

We could use a wacky container of void* pointers to Bx objects with casts to retrieve them, but that's fugly and so C-style... but more importantly that would leave us with the lifetimes of many dynamically allocated objects to manage.

Instead, what can be done is this:

union Bee
{
    B1 b1;
    .
    .
    .
    Bn bn;
};

enum BeesTypes { TYPE_B1, ..., TYPE_BN };

class A
{
private:
    std::unordered_map<int, Bee> data; // C++11, otherwise use std::map

public:
    Bee get(int); // the implementation is obvious: get from the unordered map
};

Then, to get the content of a union instance from data, you use a.get(TYPE_B2).b2 and the likes, where a is a class A instance.

This is all the more powerful since unions are unrestricted in C++11. See the document linked to above or this article for details.


One example is in the embedded realm, where each bit of a register may mean something different. For example, a union of an 8-bit integer and a structure with 8 separate 1-bit bitfields allows you to either change one bit or the entire byte.


Herb Sutter wrote in GOTW about six years ago, with emphasis added:

"But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see... ."

And for a less useful example, see the long but inconclusive question gcc, strict-aliasing, and casting through a union.


Well, one example use case I can think of is this:

typedef union
{
    struct
    {
        uint8_t a;
        uint8_t b;
        uint8_t c;
        uint8_t d;
    };
    uint32_t x;
} some32bittype;

You can then access the 8-bit separate parts of that 32-bit block of data; however, prepare to potentially be bitten by endianness.

This is just one hypothetical example, but whenever you want to split data in a field into component parts like this, you could use a union.

That said, there is also a method which is endian-safe:

uint32_t x;
uint8_t a = (x & 0xFF000000) >> 24;

For example, since that binary operation will be converted by the compiler to the correct endianness.


Some uses for unions:

  • Provide a general endianness interface to an unknown external host.
  • Manipulate foreign CPU architecture floating point data, such as accepting VAX G_FLOATS from a network link and converting them to IEEE 754 long reals for processing.
  • Provide straightforward bit twiddling access to a higher-level type.
union {
      unsigned char   byte_v[16];
      long double     ld_v;
 }

With this declaration, it is simple to display the hex byte values of a long double, change the exponent's sign, determine if it is a denormal value, or implement long double arithmetic for a CPU which does not support it, etc.

  • Saving storage space when fields are dependent on certain values:

    class person {  
        string name;  
    
        char gender;   // M = male, F = female, O = other  
        union {  
            date  vasectomized;  // for males  
            int   pregnancies;   // for females  
        } gender_specific_data;
    }
    
  • Grep the include files for use with your compiler. You'll find dozens to hundreds of uses of union:

    [wally@zenetfedora ~]$ cd /usr/include
    [wally@zenetfedora include]$ grep -w union *
    a.out.h:  union
    argp.h:   parsing options, getopt is called with the union of all the argp
    bfd.h:  union
    bfd.h:  union
    bfd.h:union internal_auxent;
    bfd.h:  (bfd *, struct bfd_symbol *, int, union internal_auxent *);
    bfd.h:  union {
    bfd.h:  /* The value of the symbol.  This really should be a union of a
    bfd.h:  union
    bfd.h:  union
    bfdlink.h:  /* A union of information depending upon the type.  */
    bfdlink.h:  union
    bfdlink.h:       this field.  This field is present in all of the union element
    bfdlink.h:       the union; this structure is a major space user in the
    bfdlink.h:  union
    bfdlink.h:  union
    curses.h:    union
    db_cxx.h:// 4201: nameless struct/union
    elf.h:  union
    elf.h:  union
    elf.h:  union
    elf.h:  union
    elf.h:typedef union
    _G_config.h:typedef union
    gcrypt.h:  union
    gcrypt.h:    union
    gcrypt.h:    union
    gmp-i386.h:  union {
    ieee754.h:union ieee754_float
    ieee754.h:union ieee754_double
    ieee754.h:union ieee854_long_double
    ifaddrs.h:  union
    jpeglib.h:  union {
    ldap.h: union mod_vals_u {
    ncurses.h:    union
    newt.h:    union {
    obstack.h:  union
    pi-file.h:  union {
    resolv.h:   union {
    signal.h:extern int sigqueue (__pid_t __pid, int __sig, __const union sigval __val)
    stdlib.h:/* Lots of hair to allow traditional BSD use of `union wait'
    stdlib.h:  (__extension__ (((union { __typeof(status) __in; int __i; }) \
    stdlib.h:/* This is the type of the argument to `wait'.  The funky union
    stdlib.h:   causes redeclarations with either `int *' or `union wait *' to be
    stdlib.h:typedef union
    stdlib.h:    union wait *__uptr;
    stdlib.h:  } __WAIT_STATUS __attribute__ ((__transparent_union__));
    thread_db.h:  union
    thread_db.h:  union
    tiffio.h:   union {
    wchar.h:  union
    xf86drm.h:typedef union _drmVBlank {
    

Unions are useful when dealing with byte-level (low level) data.

One of my recent usage was on IP address modeling which looks like below :

// Composite structure for IP address storage
union
{
    // IPv4 @ 32-bit identifier
    // Padded 12-bytes for IPv6 compatibility
    union
    {
        struct
        {
            unsigned char _reserved[12];
            unsigned char _IpBytes[4];
        } _Raw;

        struct
        {
            unsigned char _reserved[12];
            unsigned char _o1;
            unsigned char _o2;
            unsigned char _o3;
            unsigned char _o4;    
        } _Octet;    
    } _IPv4;

    // IPv6 @ 128-bit identifier
    // Next generation internet addressing
    union
    {
        struct
        {
            unsigned char _IpBytes[16];
        } _Raw;

        struct
        {
            unsigned short _w1;
            unsigned short _w2;
            unsigned short _w3;
            unsigned short _w4;
            unsigned short _w5;
            unsigned short _w6;
            unsigned short _w7;
            unsigned short _w8;   
        } _Word;
    } _IPv6;
} _IP;