The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int*
and int*
may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:
unsigned int
is read through an int*
, the value must be within the range of int
or an integer overflow occurs and the behaviour is undefined. Is this correct?int
is read through an unsigned int*
, negative values wrap around as if they were casted to unsigned int
. Is this correct?int
and unsigned int
, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?Additionally, what about compatible but not equivalent integer types?
int
and long
have the same range, alignment, etc., can int*
and long*
alias? (I assume not.)char16_t*
and uint_least16_t*
alias? I suspect this differs between C and C++. In C, char16_t
is a typedef for uint_least16_t
(correct?). In C++, char16_t
is its own primitive type, which compatible with uint_least16_t
. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.
The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.
The hardware is designed to compare signed to signed and unsigned to unsigned. If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.
An int is signed by default, meaning it can represent both positive and negative values. An unsigned is an integer that can never be negative.
If an
unsigned int
is read through anint*
, the value must be within the range ofint
or an integer overflow occurs and the behaviour is undefined. Is this correct?
Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int
object and see it through an int
. In what way the value of the unsigned int
object transposes to the value of an int
is completely implementation defined.
If an
int
is read through anunsigned int*
, negative values wrap around as if they were casted to unsigned int. Is this correct?
Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int
to unsigned
is always defined through a congruence:
If the destination type is
unsigned
, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n wheren
is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
And now consider
10000000 00000001 // -1 in signed magnitude for 16-bit int
This would certainly be 215+1 if interpreted as an unsigned
. A cast would yield 216-1 though.
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0
.
On systems where
int
andlong
have the same range, alignment, etc., canint*
andlong*
alias? (I assume not.)
No. They are independent types.
Can
char16_t*
anduint_least16_t*
alias?
Technically not, but that seems to be an unneccessary restriction of the standard.
Types
char16_t
andchar32_t
denote distinct types with the same size, signedness, and alignment asuint_least16_t
anduint_least32_t
, respectively, in<cstdint>
, called the underlying types.
So it should be practically possible without any risks (since there shouldn't be any padding).
If an
int
is read through anunsigned int*
, negative values wrap around as if they were casted tounsigned int
. Is this correct?
For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:
int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;
Here, both u1
and u2
have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n
being initialized to -1:
int n = -1; /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n; /* 11111111 11111111
effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n; /* 10000000 00000001
only reinterpreted, the value is now INT_MAX + 2u */
Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1
is a no-op on 2's complement machines, but requires some calculations on other machines.
If an
unsigned int
is read through anint*
, the value must be within the range ofint
or an integer overflow occurs and the behaviour is undefined. Is this correct?
Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char
shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
- the corresponding value with sign bit 0 is negated (sign and magnitude);
- the sign bit has the value -2M (two's complement);
- the sign bit has the value -2M-1 (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
E.g., an unsigned int
could have value bits, where the corresponding signed type (int
) has a padding bit, something like unsigned u = ...; int n = *(int *)&u;
may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.
If the value is within the range of both
int
andunsigned int
, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:
The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
On systems where
int
andlong
have the same range, alignment, etc., canint*
andlong*
alias? (I assume not.)
Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:
int n = 42;
long l = *(long *)&n; // UB
Can
char16_t*
anduint_least16_t*
alias? I suspect this differs between C and C++. In C,char16_t
is a typedef foruint_least16_t
(correct?). In C++,char16_t
is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
I'm not sure about C++, but at least for C, char16_t
is a typedef, but not necessarily for uint_least16_t
, it could very well be a typedef of some implementation-specific __char16_t
, some type incompatible with uint_least16_t
(or any other type).
It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.
Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:
Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.
You should not rely on the sizes of types. If you chose another compiler or platform your long
and int
may not match anymore.
Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With