Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to check equality between instances of a union?

Tags:

c++

c

unions

I have a multithreaded application that stores data as an array of instances of the following union

union unMember {
    float fData;
    unsigned int uiData;
};

The object that stores this array knows what type the data in the union is and so I dont have problems with UB when retrieving the correct type. However in other parts of the program, I need to test equality between 2 instances of these unions and in this part of the code the true internal data type is not known. The result of this is that I can't test equality of the union using this kind of approach

  unMember un1;
  unMember un2;
  if (un1 == un2) {
     // do stuff
  }

as I get compiler errors. As such I am simply to compare the float part of the union

  if (un1.fData == un2.fData) {
     // compiles but is it valid?
  }

Now given that I have read about it being UB accessing any part of a union that was not the part that was last written to (that is cumbersomely written but I can think of no more articulate way to say this) I am wondering if the code above is a valid way to check equality of my union instances??

This has made me realise that internally I have no idea how unions really work. I had assumed that data was simply stored as a bit pattern and that you could interpret that in whatever way you like depending on the types listed in the union. If this is not the case, what is a safe/correct way to test equality of 2 instances of a union?

Finally, my application is written in C++ but I realise that unions are also part of C, so is there any difference in how they are treated by the 2 languages?

like image 906
mathematician1975 Avatar asked Aug 30 '12 11:08

mathematician1975


People also ask

Can you compare unions in C?

For compare union you have to write some code like this one: all_integer union1; all_integer union2; /*To ensure all the unused data of the unions are the same, it's necessary to set unions, before to use it, at the same value (0 in this case).

What is equality operator in C?

Both operands of any relational or equality operator can be pointers to the same type. For the equality ( == ) and inequality ( != ) operators, the result of the comparison indicates whether the two pointers address the same memory location.


2 Answers

In general, you need to prepend some kind of indicator of the current union type:

struct myData
{
    int dataType;
    union {
        ...
    } u;
}

Then:

if (un1.dataType != un2.dataType)
    return (1 == 0);
switch(un1.dataType)
{
    case TYPE_1:
        return (un1.u.type1 == un2.u.type1);
    case TYPE_2:
        ...
}

Anyway, the syntax

if (un1.fData == un2.fData) {
    // compiles but is it valid?
}

which does compile and is valid, might not work for two reasons. One is that, as you said, maybe un2 contains an integer and not a floating point. But in that case the equality test will normally fail anyway. The second is that both structures hold a floating point, and they represent the same number with a slight machine error. Then the test will tell you the numbers are different (bit by bit they are), while their "meaning" is the same.

Floating points are usually compared like

if (dabs(f1 - f2) < error)

to avoid this pitfall.

like image 102
LSerni Avatar answered Nov 15 '22 15:11

LSerni


In C++, members that are not the last member written to are considered to be uninitialized (and so reading them is undefined behaviour). In C, they are considered to contain the object representation of the member that was written to, which may or not be a valid object representation.

That is,

union U {
    S x;
    T y;
} u;
u.x = 0;
T t = u.y;    // C++ - reading uninitialized memory - could crash
T t = u.y;    /* C - reading object representation of u.x - could crash */

In practice, C++ reading a union non-assigned member will behave the same as C if the code is sufficiently remote from the code that wrote the assigned member, because the only way for the compiler to generate code that behaves differently is to optimize the read-write combination.

A safe method in both languages (guaranteed not to crash) is to compare the memory contents as an array of char e.g. using memcmp:

union U u1, u2;
u1.x = 0;
u2.x = 0;

memcmp(&u1, &u2, sizeof(union U));

This may not however reflect the actual equality of the union members; e.g. for floating-point types two NaN can values have the same memory representation and compare unequal, while -0.0 and 0.0 (negative and positive zero) have different memory representations but compare equal. There is also the issue of the two types having different sizes, or containing bits that do not participate in the value (padding bits, not an issue on most modern commodity platforms). In addition, struct types can contain padding for alignment.

like image 28
ecatmur Avatar answered Nov 15 '22 15:11

ecatmur