Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does accessing an int with a char * potentially have undefined behavior?

The code below for testing endianness is expected to have implementation defined behavior:

int is_little_endian(void) {
    int x = 1;
    char *p = (char*)&x;
    return *p == 1;
}

But is it possible that it may have undefined behavior on purposely contrived architectures? For example could the first byte of the representation of an int with value 1 (or another well chosen value) be a trap value for the char type?

As noted in comments, the type unsigned char would not have this issue as it cannot have trap values, but this question specifically concerns the char type.

like image 210
chqrlie Avatar asked Feb 01 '18 19:02

chqrlie


1 Answers

Per C 2018 6.2.5 15, char behaves as either signed char or unsigned char. Suppose it is signed char. 6.2.6.2 2 discusses signed integer types, including signed char. At the end of this paragraph, it says:

Which of these [sign and magnitude, two’s complement, or ones’ complement] applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value.

Thus, this paragraph allows signed char to have a trap representation. However, the paragraph in the standard that says accessing trap representations may have undefined behavior, 6.2.6.1 5, specifically excludes character types:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

Thus, although char may have trap representations, there is no reason we should not be able to access it. There is then the question of what happens if we use the value in an expression? If a char has a trap representation, it does not represent a value. So attempting to compare it to 1 in *p == 1 does not seem to have a defined behavior.

The specific value of 1 in an int will not result in a trap representation in char for any normal C implementation, as the 1 will be in the “rightmost” (lowest valued) bit of some byte of the int, and no normal C implementation puts the sign bit of a char in the bit in that position. However, the C standard apparently does not prohibit such an arrangement, so, theoretically, an int with value 1 might be encoded with bits 00000001 in one of its bytes, and those bits might be a trap representation for a char.

like image 125
Eric Postpischil Avatar answered Sep 28 '22 02:09

Eric Postpischil