Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to detect endianess with union?

In other words, according to the C standard, is this code safe? (Assume uint8_t is one byte)

void detectEndianness(void){
    union {
        uint16_t w;
        uint8_t b;
    } a;
    a.w = 0x00FFU;
    if (a.b == 0xFFU) {
        puts("Little endian.");
    }
    else if (a.b == 0U) {
        puts("Big endian.");
    }
    else {
        puts("Stack Overflow endian.");
    }
}

What if I change it into this? Note the third if case that I'm aware of.

a.w = 1U;
if (a.b == 1U) { puts("Little endian."); }
else if (a.b == 0U) { puts ("Big endian."); }
else if (a.b == 0x80U) { /* Special potential */ }
else { puts("Stack Overflow endian."); }
like image 911
iBug Avatar asked Nov 22 '17 09:11

iBug


People also ask

How do you check the endianness of a machine?

Now if you take a pointer c of type char and assign x 's address to c by casting x to char pointer, then on little endian architecture you will get 0x10 when *c is printed and on big endian architecture you will get 0x76 while printing down *c . Thereby you can find out the endianness for machine.

What is the point of endianness?

If my computer reads bytes from left to right, and your computer reads from right to left, we're going to have issues when we need to communicate. Endianness means that the bytes in computer memory are read in a certain order. We won't have any issues if we never need to share information.

How do you know if you're little endian?

If it is little-endian, it would be stored as “01 00 00 00”. The program checks the first byte by dereferencing the cptr pointer. If it equals to 0, it means the processor is big-endian(“00 00 00 01”), If it equals to 1, it means the processor is little-endian (“01 00 00 00”).

Does Windows use little or big-endian?

The following platforms are considered little endian: AXP/VMS, Digital UNIX, Intel ABI, OS/2, VAX/VMS, and Windows. On big endian platforms, the value 1 is stored in binary and is represented here in hexadecimal notation.


2 Answers

Quoting from n1570:

6.5.2.3 Structure and union members - p3

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, and is an lvalue if the first expression is an lvalue.

6.2.6 Representations of types / 1 General - p7

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

It's allowed. And your use case could even be considered one intended purpose, if note 95 is taken into account (despite being only informative):

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Now, since the uintN_t family of types are defined to have no padding bits

7.20.1.1 Exact-width integer types - p2

The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

All their bit representations are valid values, no trap representations are possible. So we must conclude that it will indeed check for the endianess of uint16_t.

like image 120
StoryTeller - Unslander Monica Avatar answered Oct 21 '22 09:10

StoryTeller - Unslander Monica


The standard (available in the linked online draft) says in a footnote that it is allowed to access a different member of the same union than the member previously written:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.

But the footnote also mentions a possible trap representation, and the only data type that is guaranteed by the standard to be safe concerning trap representations is unsigned char. Accessing trap representations may be undefined behaviour; and although I don't think that unit_32 may yield a trap representation on your platform, it is actually implementation dependant whether accessing this member is UB or not.

like image 24
Stephan Lechner Avatar answered Oct 21 '22 09:10

Stephan Lechner