Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A question about union in C - store as one type and read as another - is it implementation defined?

I was reading about union in C from K&R, as far as I understood, a single variable in union can hold any one of the several types and if something is stored as one type and extracted as another the result is purely implementation defined.

Now please check this code snippet:

#include<stdio.h>  int main(void) {   union a   {      int i;      char ch[2];   };    union a u;   u.ch[0] = 3;   u.ch[1] = 2;    printf("%d %d %d\n", u.ch[0], u.ch[1], u.i);    return 0; } 

Output:

3 2 515 

Here I am assigning values in the u.ch but retrieving from both u.ch and u.i. Is it implementation defined? Or am I doing something really silly?

I know it may seem very beginner to most of other people but I am unable to figure out the reason behind that output.

Thanks.

like image 757
whacko__Cracko Avatar asked Nov 28 '09 11:11

whacko__Cracko


People also ask

What are the applications of union data type in C programming?

A union is a special data type available in C that allows to store different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Unions provide an efficient way of using the same memory location for multiple-purpose.

What does a union do in C?

C unions allow data members which are mutually exclusive to share the same memory. This is quite important when memory is valuable, such as in embedded systems. Unions are mostly used in embedded programming where direct access to the memory is needed.

What is union in data structures?

A structure contains an ordered group of data objects. Unlike the elements of an array, the data objects within a structure can have varied data types. Each data object in a structure is a member or field. A union is an object similar to a structure except that all of its members start at the same location in memory.


2 Answers

This is undefined behaviour. u.i and u.ch are located at the same memory address. So, the result of writing into one and reading from the other depends on the compiler, platform, architecture, and sometimes even compiler's optimization level. Therefore the output for u.i may not always be 515.

Example

For example gcc on my machine produces two different answers for -O0 and -O2.

  1. Because my machine has 32-bit little-endian architecture, with -O0 I end up with two least significant bytes initialized to 2 and 3, two most significant bytes are uninitialized. So the union's memory looks like this: {3, 2, garbage, garbage}

    Hence I get the output similar to 3 2 -1216937469.

  2. With -O2, I get the output of 3 2 515 like you do, which makes union memory {3, 2, 0, 0}. What happens is that gcc optimizes the call to printf with actual values, so the assembly output looks like an equivalent of:

    #include <stdio.h> int main() {     printf("%d %d %d\n", 3, 2, 515);     return 0; } 

    The value 515 can be obtained as other explained in other answers to this question. In essence it means that when gcc optimized the call it has chosen zeroes as the random value of a would-be uninitialized union.

Writing to one union member and reading from another usually does not make much sense, but sometimes it may be useful for programs compiled with strict aliasing.

like image 115
Alex B Avatar answered Oct 13 '22 13:10

Alex B


The answer to this question depends on the historical context, since the specification of the language changed with time. And this matter happens to be the one affected by the changes.

You said that you were reading K&R. The latest edition of that book (as of now), describes the first standardized version of C language - C89/90. In that version of C language writing one member of union and reading another member is undefined behavior. Not implementation defined (which is a different thing), but undefined behavior. The relevant portion of the language standard in this case is 6.5/7.

Now, at some later point in evolution of C (C99 version of language specification with Technical Corrigendum 3 applied) it suddenly became legal to use union for type punning, i.e. to write one member of the union and then read another.

Note that attempting to do that can still lead to undefined behavior. If the value you read happens to be invalid (so called "trap representation") for the type you read it through, then the behavior is still undefined. Otherwise, the value you read is implementation defined.

Your specific example is relatively safe for type punning from int to char[2] array. It is always legal in C language to reinterpret the content of any object as a char array (again, 6.5/7).

However, the reverse is not true. Writing data into the char[2] array member of your union and then reading it as an int can potentially create a trap representation and lead to undefined behavior. The potential danger exists even if your char array has sufficient length to cover the entire int.

But in your specific case, if int happens to be larger than char[2], the int you read will cover uninitialized area beyond the end of the array, which again leads to undefined behavior.

like image 28
AnT Avatar answered Oct 13 '22 13:10

AnT