The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, <code>unsigned int*</code> and <code>int*</code> may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions: <ul> <li>If an <code>unsigned int</code> is read through an <code>int*</code>, the value must be within the range of <code>int</code> or an integer overflow occurs and the behaviour is undefined. Is this correct?</li> <li>If an <code>int</code> is read through an <code>unsigned int*</code>, negative values wrap around as if they were casted to <code>unsigned int</code>. Is this correct?</li> <li>If the value is within the range of both <code>int</code> and <code>unsigned int</code>, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?</li> </ul> Additionally, what about compatible but not equivalent integer types? <ul> <li>On systems where <code>int</code> and <code>long</code> have the same range, alignment, etc., can <code>int*</code> and <code>long*</code> alias? (I assume not.)</li> <li>Can <code>char16_t*</code> and <code>uint_least16_t*</code> alias? I suspect this differs between C and C++. In C, <code>char16_t</code> is a typedef for <code>uint_least16_t</code> (correct?). In C++, <code>char16_t</code> is its own primitive type, which compatible with <code>uint_least16_t</code>. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.</li> </ul>

<blockquote> If an <code>int</code> is read through an <code>unsigned int*</code>, negative values wrap around as if they were casted to <code>unsigned int</code>. Is this correct? </blockquote> For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example: <pre class="prettyprint"><code>int n = ...; unsigned u1 = (unsigned)n; unsigned u2 = *(unsigned *)&n; </code></pre> Here, both <code>u1</code> and <code>u2</code> have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and <code>n</code> being initialized to -1: <pre class="prettyprint"><code>int n = -1; /* 10000000 00000001 assuming 16-bit integers*/ unsigned u1 = (unsigned)n; /* 11111111 11111111 effectively 2's complement, UINT_MAX */ unsigned u2 = *(unsigned *)&n; /* 10000000 00000001 only reinterpreted, the value is now INT_MAX + 2u */ </code></pre> Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of <code>u1</code> is a no-op on 2's complement machines, but requires some calculations on other machines. <blockquote> If an <code>unsigned int</code> is read through an <code>int*</code>, the value must be within the range of <code>int</code> or an integer overflow occurs and the behaviour is undefined. Is this correct? </blockquote> Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]: <blockquote> 6.2.6.2 Integer types For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified. For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; <code>signed char</code> shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways: <ul> <li>the corresponding value with sign bit 0 is negated (sign and magnitude);</li> <li>the sign bit has the value -2M (two's complement);</li> <li>the sign bit has the value -2M-1 (ones' complement).</li> </ul> Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero. </blockquote> E.g., an <code>unsigned int</code> could have value bits, where the corresponding signed type (<code>int</code>) has a padding bit, something like <code>unsigned u = ...; int n = *(int *)&u;</code> may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round. <blockquote> If the value is within the range of both <code>int</code> and <code>unsigned int</code>, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct? </blockquote> I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5: <blockquote> The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type. </blockquote> <hr> <blockquote> On systems where <code>int</code> and <code>long</code> have the same range, alignment, etc., can <code>int*</code> and <code>long*</code> alias? (I assume not.) </blockquote> Sure they can, if you don't use them ;) But no, the following is invalid on such platforms: <pre class="prettyprint"><code>int n = 42; long l = *(long *)&n; // UB </code></pre> <blockquote> Can <code>char16_t*</code> and <code>uint_least16_t*</code> alias? I suspect this differs between C and C++. In C, <code>char16_t</code> is a typedef for <code>uint_least16_t</code> (correct?). In C++, <code>char16_t</code> is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias. </blockquote> I'm not sure about C++, but at least for C, <code>char16_t</code> is a typedef, but not necessarily for <code>uint_least16_t</code>, it could very well be a typedef of some implementation-specific <code>__char16_t</code>, some type incompatible with <code>uint_least16_t</code> (or any other type).

Aliasing of otherwise equivalent signed and unsigned types

Tags:

c++

c

language-lawyer

The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:

If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

Additionally, what about compatible but not equivalent integer types?

On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.

223

asked Nov 24 '14 16:11

Tavian Barnes

3 Answers

If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?

Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.

If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?

Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2ⁿ where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

And now consider

10000000 00000001  // -1 in signed magnitude for 16-bit int

This would certainly be 2¹⁵+1 if interpreted as an unsigned. A cast would yield 2¹⁶-1 though.

If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.

On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)

No. They are independent types.

Can char16_t* and uint_least16_t* alias?

Technically not, but that seems to be an unneccessary restriction of the standard.

Types char16_t and char32_t denote distinct types with the same size, signedness, and alignment as uint_least16_t and uint_least32_t, respectively, in <cstdint>, called the underlying types.

So it should be practically possible without any risks (since there shouldn't be any padding).

156

answered Oct 13 '22 09:10

Columbo

If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?

For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:

int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;

Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:

int n = -1;                     /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n;      /* 11111111 11111111
        effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n;  /* 10000000 00000001
        only reinterpreted, the value is now INT_MAX + 2u */

Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.

If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?

Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:

6.2.6.2 Integer types

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^N-1, so that objects of that type shall be capable of representing values from 0 to 2^N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

the corresponding value with sign bit 0 is negated (sign and magnitude);

the sign bit has the value -2^M (two's complement);

the sign bit has the value -2^M-1 (ones' complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.

E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.

If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:

The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)

Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:

int n = 42;
long l = *(long *)&n; // UB

Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.

I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).

answered Oct 13 '22 11:10

mafso

It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.

Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:

Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.

You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.

Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.

answered Oct 13 '22 09:10

vlad_tepesch

Related questions
                            
                                How exactly does the extract>> operator works in C++
                            
                                Eclipse CDT Build Console output not displaying entire compiler output
                            
                                int to void* - avoiding c-style cast?
                            
                                C++ - shared_ptr<vector<T>> vs. vector<shared_ptr<T>>
                            
                                CMake: add_custom_command with PRE_BUILD does not work
                            
                                In current C++ and Java, double type and float type : if (x == 0.0) is correct? [duplicate]
                            
                                How to return two-dimensional arrays in C++
                            
                                How does the type deduction work for string literals in C++?
                            
                                How can I detect false sharing slowing down performance?
                            
                                unique ptr initialisation assertion failure
                            
                                MinGW “The procedure entry point libintl_setlocale could not be located …”
                            
                                Difference between initializing a C-style string to NULL vs. the empty string
                            
                                Qt signal lambda causes shared_ptr leak?
                            
                                Literal "or" in c++ program? [duplicate]
                            
                                Adding two vector in assembly x86_64 with AVX2 plus technical clarifications
                            
                                Efficiently painting physically accurate ruler in Qt
                            
                                A general-purpose STLish contains()
                            
                                Boost, how to parse following string to date/time
                            
                                Can conversion functions be non-member functions
                            
                                Optimizer removing pointer de-reference lines

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With