Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it undefined behaviour to read a different member than was written in a Union?

Tags:

c++

union test{
  char a; // 1 byte
  int b;  // 4 bytes
};

int main(){ 
  test t;
  t.a = 5;
  return t.b;
}

This link says: https://en.cppreference.com/w/cpp/language/union

It's undefined behavior to read from the member of the union that wasn't most recently written.

According to this, does my sample code above have UB? If so then what's the point of a Union then? I thought the whole point it to read/write different value types form the same memory location.

If I need to access the most recently written value then I will just use a regular variable and not a Union.

like image 608
Dan Avatar asked Sep 12 '25 07:09

Dan


2 Answers

Yes the behaviour is undefined in C++.

When you write a value to a member of union, think of that member becoming the active member.

The behaviour of reading any member of a union that is not the active member is undefined.

in C++, a union is often coupled with another variable that serves as a means of identifying the active member.

like image 65
Bathsheba Avatar answered Sep 13 '25 20:09

Bathsheba


Your implication that having unions without the possibility of reading their inactive members makes them useless is wrong. Consider the following simplified implementation of a string class:

class string {
  char* data_;
  size_t size_;
  union {
    size_t capacity_;
    char buffer_[16];
  };

  string(const char* str) : size_(strlen(str)) {
    if (size_ < 16) 
      data_ = buffer_;  // short string, buffer_ will be active
    else {
      capacity_ = size_;  // long string, capacity_ is active
      data_ = new char[capacity_ + 1];
    }
    memcpy(data_, str, size_ + 1);      
  }

  bool is_short() const { return data_ == buffer_; }
  ...
public:
  size_t capacity() const { return is_short() ? 15 : capacity_; }
  const char* data() const { return data_; }
  ...
};

Here, if the stored string has less then 16 characters, it is stored in buffer_ and data_ points to it. Otherwise, data_ points to a dynamically-allocated buffer.

Consequently, you can distinguish between both cases (short/long string) by comparing data_ with buffer_. When the string is short, buffer_ is active and you don't need to read capacity_, since you know it is 15. When the string is long, capacity_ is active and you don't need to read buffer_, since it is unused.

Exactly this approach is used in libstdc++. It is a bit more complicated there since std::string is just a specialization of std::basic_string class template, but the idea is the same. Source code from include/bits/basic_string.h:

enum { _S_local_capacity = 15 / sizeof(_CharT) };

union
{
  _CharT    _M_local_buf[_S_local_capacity + 1];
  size_type _M_allocated_capacity;
};

It can save a lot of space if your program works with a lot of strings at once (consider, e.g., databases). Without union, each string objects would take 8 more bytes in memory.

like image 42
Daniel Langr Avatar answered Sep 13 '25 20:09

Daniel Langr