Consider the following code:
int i = 1;
char c[sizeof (i)];
memcpy(c, &i, sizeof (i));
cout << static_cast<int>(c[0]);
Please ignore whether this is good code. I know the output depends on the endianness of the system. This is only an academic question.
Is this code:
The language does not say that doing this is immediately undefined behavior. It simply says that the representation of c[0]
might end up being invalid (trap) representation, in which case the behavior is indeed undefined. But in cases when c[0]
is not a trap representation, the behavior is implementation-defined.
If you use unsigned char
array, trap representation becomes impossible and behavior becomes purely implementation-defined.
The rule you are looking for is 3.9p4:
The object representation of an object of type
T
is the sequence ofN
unsigned char
objects taken up by the object of typeT
, whereN
equalssizeof(T)
. The value representation of an object is the set of bits that hold the value of typeT
. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
So if you use unsigned char
, you do get implementation-defined behavior (any conforming implementation must give you a guarantee on what that behavior is).
Reading through char
is also legal, but then the values are unspecified. You are however guaranteed that using unqualified char
will preserve the value (therefore bare char
cannot have trap representations or padding bits), according to 3.9p2:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type
T
, the underlying bytes (1.7) making up the object can be copied into an array ofchar
orunsigned char
. If the content of the array ofchar
orunsigned char
is copied back into the object, the object shall subsequently hold its original value.
("unspecified" values are a bit weaker than "implementation-defined" values -- the semantics are the same but the platform is not required to document what the values are.)
It is clearly implementation defined behaviour.
The internal representation of an int is not defined by the standard (implementations can choose little or big endian or whatever else), so it cannot be well defined behaviour : the result is allowed to be different on different architectures.
On a defined system (architecture and C compiler and (eventually) configuration) the behaviour is perfectly determined : on a big endian, you will get a 1, on a little endian a 0. So it is implementation defined behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With