Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Portable data reinterpretation

Tags:

c

type-punning

I want to reinterpret data of one type as another type in a portable way (C99). I am not talking about casting, I want a reinterpretation of some given data. Also, by portable I mean that it does not break C99 rules - I do not mean that the reinterpretated value is equal on all systems.

I know 3 different way to reinterpret data, but only two of these are portable:

  1. This is not portable - it breaks the strict aliasing rule.

    /* #1 Type Punning */
    
    float float_value = 3.14;
    int *int_pointer = (int *)&float_value;
    int int_value = *int_pointer;
    
  2. This is platform dependent, because it reads an int value from the union after writing a float into it. But it does not break any C99 rules, so that should work (if sizeof(int) == sizeof(float)).

    /* #2 Union Punning */
    
    union data {
      float float_value;
      int int_value;
    };
    
    union data data_value;
    data_value.float_value = 3.14;
    int int_value = data_value.int_value;
    
  3. Should be fine, as long as sizeof(int) == sizeof(float)

    /* #3 Copying */
    
    float float_value = 3.14;
    int int_value = 0;
    memcpy(&int_value, &float_value, sizeof(int_value));
    

My Questions:

  1. Is this correct?
  2. Do you know other ways to reinterpret data in a portable way?
like image 433
Johannes Avatar asked Dec 14 '11 21:12

Johannes


2 Answers

Solution 2 is portable - type punning through unions has always been legal in C99, and it was made explicit with TC3, which added the following footnote to section 6.5.2.3:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Annex J still lists it as unspecfied behaviour, which is a known defect and has been corrected with C11, which changed

The value of a union member other than the last one stored into [is unspecified]

to

The values of bytes that correspond to union members other than the one last stored into [are unspecified]

It's not that big a deal as the annex is only informative, not normative.

Keep in mind that you can still end up with undefined behaviour, eg

  • by creating a trap representation
  • by violating aliasing rules in case of members with pointer type (which should not be converted via type-punning anyway as there need not be a uniform pointer representation)
  • if the union members have different sizes - only the bytes of the member last used in a store have specified value; in particular, storing values in a smaller member can also invalidate trailing bytes of a larger member
  • if a member contains padding bytes, which always take unspecified values
like image 106
Christoph Avatar answered Oct 13 '22 22:10

Christoph


  1. The union solution is as defined as the memcpy one in C (AFAIK, it is UB in C++), see DR283

  2. It is possible to cast a pointer to a pointer to (signed/unsigned/) char, so

    unsigned char *ptr = (unsigned char*)&floatVar;
    

    and then accessing ptr[0] to ptr[sizeof(floatVar)-1] is legal.

like image 2
AProgrammer Avatar answered Oct 13 '22 22:10

AProgrammer