Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to memset an anonymous union with 0

How should I zero out an anonymous union? I couldn't find anything on cppreference page about it. Would memseting it's largest member with 0 work here?

For example -

#include <iostream>
#include <cstring>

struct s{
    char a;
    char b[100];
};

int main(){
 union {
   int a;
   s b;
   char c;
 };

  // b.a = 'a'; (1)

  std::memset(&b, 0, sizeof(b));

  std::cout << a << "\n";
  std::cout << b.a << " " << b.b << "\n";
  std::cout << c << "\n";
}

Also if this would work, should I uncomment (1) before using memset() to activate the largest member?

like image 538
Abhinav Gauniyal Avatar asked Feb 21 '17 07:02

Abhinav Gauniyal


People also ask

How do you structure a memset?

memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers. Also, some APIs require that the structure really be set to all-bits-zero.

What is anonymous union in C++?

An anonymous union is a union without a name. It cannot be followed by a declarator. An anonymous union is not a type; it defines an unnamed object. The member names of an anonymous union must be distinct from other names within the scope in which the union is declared.

What is a union class?

A union is a special class type that can hold only one of its non-static data members at a time. The class specifier for a union declaration is similar to class or struct declaration: union attr class-head-name { member-specification } attr.


3 Answers

If you really want to respect the standard, you should know that the code you have written is undefined behaviour:C++ standard §3.8 [basic.life]:

... except that if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union (8.6.1, 12.6.2), or as described in 9.3. The lifetime of an object o of type T ends when: (1.3) — if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or (1.4) — the storage which the object occupies is released, or is reused by an object that is not nested within o (1.8).

In §9.3 it is explained that you can activate a member of a standard-layout union by assigning to it. It also explains that you can explore the value of a member of a union which is not activated only when certain criteria are respected:

If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see 9.2. — end note ]

So when you write std::cout<< a << "\n" you have not initialized a, or activated it by an assignment, and no member have been initialized so you are in Undefined Behavior (Nota: but the compilers I know support it, at least on PC, as an extension to the standard.)

So before using a you will have to write a=0, or make a the initialized member of the union, because a does not share a common initialization sequence with neither b nor c.

So if you use memset as also proposed in the answer of MSalters whatever you do, you will have to assign something to a member of the union before using it. If want to stay in defined behavior, do not use memset. Notice that memset can safely be used with standard-layout object which are not member of union since their life-time begin when storage is obtained for them.


In conclusion to stay in defined behaviour you must at least initialize one member, then you can inspect other members of the union who share a common initialization sequence with the initialized member.

  1. If your intent is to use an anonymous union in the main function, you can declare the union static: all static objects are zero initialized. (But are not reinitialized when you recall the function which will not happen with main()):

    int main(){
     static union {
      s b;
      int a;
      char c;
      };
     //...
     }
    

    As described in C++ standard §8.6 article (6.3) [dcl.init]:

    if T is a (possibly cv-qualified) union type, the object’s first non-static named data member is zero- initialized and padding is initialized to zero bits;

  2. Otherwise if there are no padding between member of the structures (s), you can aggregate initialize with an empty list the larger member (s):

    //...
    int main(){
      union {
       int a;
       s b{};
       char c;
       };
      //...
      }
    

    This work because all members of unions are aligned. So if there are no padding between members of s, every byte of memory of the union will be zero initialized, C++ standard §9.3 [class.union] article 2:

    The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. [ Note: A union object and its non-static data members are pointer-interconvertible (3.9.2, 5.2.9). As a consequence, all non-static data members of a union object have the same address.

  3. If there is padding inside S, then just declare an array of char for initialization purpose:

    //...
    int main(){
      union {
       char _initialization[sizeof(s)]{};
       int a;
       s b;
       char c;
       };
      //...
      }
    

Nota: Using your example, or the two last code exemples, and the code using memset produces the exact same set of instructions for initialization (clang -> x86_64):

    pushq   %r14
    pushq   %rbx
    subq    $120, %rsp
    xorps   %xmm0, %xmm0
    movaps  %xmm0, 96(%rsp)
    movaps  %xmm0, 80(%rsp)
    movaps  %xmm0, 64(%rsp)
    movaps  %xmm0, 48(%rsp)
    movaps  %xmm0, 32(%rsp)
    movaps  %xmm0, 16(%rsp)
    movq    $0, 109(%rsp)
like image 196
Oliv Avatar answered Oct 11 '22 13:10

Oliv


Just memset every member, and count on the optimizer to eliminate redundant writes.

like image 35
MSalters Avatar answered Oct 11 '22 14:10

MSalters


I just share an idear, maybe we can use metaprograming like this:

template<typename T1, typename T2>
struct Bigger
{
  typedef typename std::conditional<sizeof(T1) >= sizeof(T2), T1, T2>::type Type;
};

// Recursion helper
template<typename...>
struct BiggestHelper;

// 2 or more types
template<typename T1, typename T2, typename... TArgs>
struct BiggestHelper<T1, T2, TArgs...>
{
    typedef typename Bigger<T1, typename BiggestHelper<T2, TArgs...>::Type>::Type Type;
};

// Exactly 2 types
template<typename T1, typename T2>
struct BiggestHelper<T1, T2>
{
    typedef typename Bigger<T1, T2>::Type Type;
};

// Exactly one type
template<typename T>
struct BiggestHelper<T>
{
    typedef T Type;
};

template<typename... TArgs>
struct Biggest
{
    typedef typename BiggestHelper<TArgs...>::Type Type;
};

So in the main fucntion we can do like this:

std::memset(&b, 0, sizeof(Biggest<int,s,char>::Type));
like image 43
Ron Tang Avatar answered Oct 11 '22 13:10

Ron Tang