Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Type punning and Unions in C

Tags:

c

I'm currently working on a project to build a small compiler just for the heck of it.

I've decided to take the approach of building an extremely simple virtual machine to target so I don't have to worry about learning the ins and outs of elf, intel assembly, etc.

My question is about type punning in C using unions. I've decided to only support 32 bit integers and 32 bit float values in the vm's memory. To facilitate this, the "main memory" of the vm is set up like this:

typedef union
{    
    int i;
    float f;
}word;


memory = (word *)malloc(mem_size * sizeof(word));

So I can then just treat the memory section as either an int or a float depending on the instruction.

Is this technically type punning? It certainly would be if I were to use ints as the words of memory and then use a float* to treat them like floats. My current approach, while syntactically different, I don't think is semantically different. In the end I'm still treating 32 bits in memory as either an int or a float.

The only information I could come up with online suggests that this is implementation dependent. Is there a more portable way to acheive this without wasting a bunch of space?

I could do the following, but then I would be taking up more than 2 times as much memory and "reinventing the wheel" with respect to unions.

typedef struct
{
    int i;
    float f;
    char is_int;
}

Edit

I perhaps didn't make my exact question clear. I am aware that I can use either a float or an int from a union without undefined behavior. What I'm after is specifically a way to have a 32 bit memory location that I can safely use as an int or float without knowing what the last value set was. I want to account for the situation where the other type is used.

like image 771
David Mason Avatar asked Jul 11 '12 22:07

David Mason


2 Answers

As long as you only access the member (int or float) which was most recently stored, there's no problem and no real implementation dependency. It's perfectly safe and well-defined to store a value in a union member and then read that same member.

(Note that there's no guarantee that int and float are the same size, though they are on every system I've seen.)

If you store a value in one member and then read the other, that's type punning. Quoting a footnote in the latest C11 draft:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

like image 92
Keith Thompson Avatar answered Oct 27 '22 00:10

Keith Thompson


Yes, storing one member of union and reading another is type punning (assuming the types are sufficiently different). Moreover, this is the only kind of universal (any type to any type) type punning that is officially supported by C language. It is supported in a sense that the language promises that in this case the type punning will actually occur, i.e. that a physical attempt to read an object of one type as an object of another type will take place. Among other things it means that writing one member of the union and reading another member implies a data dependency between the write and the read. This, however, still leaves you with the burden of ensuring that the type punning does not produce a trap representation.

When you use casted pointers for type punning (what is usually understood as "classic" type punning), the language explicitly states that in general case the behavior is undefined (aside from reinterpreting object's value as an array of chars and other restricted cases). Compilers like GCC implement so called "strict aliasing semantics", which basically means that the pointer-based type punning might not work as you expect it to work. For example, the compiler might (and will) ignore the data dependency between type-punned reads and writes and rearrange them arbitrarily, thus completely ruining your intent. This

int i;
float f;

i = 5;
f = *(float *) &i;

can be easily rearranged into actual

f = *(float *) &i;
i = 5;

specifically because a strict-aliased compiler deliberately ignores the possibility of data dependency between the write and the read in the example.

In a modern C compiler, when you really need to perform physical reinterpretation of one objects value as value of another type, you are restricted to either memcpy-ing bytes from one object to another or to union-based type punning. There are no other ways. Casting pointers is no longer a viable option.

like image 26
AnT Avatar answered Oct 27 '22 00:10

AnT