Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aliasing of otherwise equivalent signed and unsigned types

The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:

  • If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
  • If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
  • If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

Additionally, what about compatible but not equivalent integer types?

  • On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
  • Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
like image 223
Tavian Barnes Avatar asked Nov 24 '14 16:11

Tavian Barnes


People also ask

What is the difference between signed and unsigned data types?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.

What is the difference between signed and unsigned variables in C?

The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.

Can we compare signed and unsigned int in C?

The hardware is designed to compare signed to signed and unsigned to unsigned. If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.

Is signed or unsigned default?

An int is signed by default, meaning it can represent both positive and negative values. An unsigned is an integer that can never be negative.


3 Answers

If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?

Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.

If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?

Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

And now consider

10000000 00000001  // -1 in signed magnitude for 16-bit int

This would certainly be 215+1 if interpreted as an unsigned. A cast would yield 216-1 though.

If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.

On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)

No. They are independent types.

Can char16_t* and uint_least16_t* alias?

Technically not, but that seems to be an unneccessary restriction of the standard.

Types char16_t and char32_t denote distinct types with the same size, signedness, and alignment as uint_least16_t and uint_least32_t, respectively, in <cstdint>, called the underlying types.

So it should be practically possible without any risks (since there shouldn't be any padding).

like image 156
Columbo Avatar answered Oct 13 '22 09:10

Columbo


If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?

For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:

int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;

Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:

int n = -1;                     /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n;      /* 11111111 11111111
        effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n;  /* 10000000 00000001
        only reinterpreted, the value is now INT_MAX + 2u */

Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.

If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?

Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:

6.2.6.2 Integer types

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

  • the corresponding value with sign bit 0 is negated (sign and magnitude);
  • the sign bit has the value -2M (two's complement);
  • the sign bit has the value -2M-1 (ones' complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.

E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.

If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?

I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:

The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.


On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)

Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:

int n = 42;
long l = *(long *)&n; // UB

Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.

I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).

like image 45
mafso Avatar answered Oct 13 '22 11:10

mafso


It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.

Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:

Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.

You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.

Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.

like image 29
vlad_tepesch Avatar answered Oct 13 '22 09:10

vlad_tepesch