Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a C compiler change bit representation when casting signed to unsigned?

Is it possible for an explicit cast of, say, int32_t to uint32_t, to alter the bit representation of the value?

For example, given that I have the following union:

typedef union {
    int32_t signed_val;
    uint32_t unsigned_val;
} signed_unsigned_t;

Are these code segments guaranteed by the spec to have the same behaviour?

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    return (uint32_t) input;
}

and

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    signed_unsigned_t converter;
    converter.signed_val = input;
    return converter.unsigned_val;
}

I'm considering C99 here. I've seen a few similar questions, but they all seemed to be discussing C++, not C.

like image 480
Alexandre Araujo Moreira Avatar asked Sep 21 '13 02:09

Alexandre Araujo Moreira


People also ask

What happens when you cast a signed int to an unsigned int?

If you mix signed and unsigned int, the signed int will be converted to unsigned (which means a large positive number if the signed int has a negative value).

What typecast is applied when we have a signed and an unsigned int in an expression ?*?

To convert a signed integer to an unsigned integer, or to convert an unsigned integer to a signed integer you need only use a cast. For example: int a = 6; unsigned int b; int c; b = (unsigned int)a; c = (int)b; Actually in many cases you can dispense with the cast.

What is the range of signed integer?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295].

What is unsigned integer in Python?

An unsigned integer is a 32-bit non-negative integer(0 or positive numbers) in the range of 0 to 2^32-1.


1 Answers

Casting a signed integer type to an unsigned integer type of the same width can change the representation, if you can find a machine with sign-magnitude or ones-complement signed representations. But the types int32_t and uint32_t are guaranteed to be two's-complement representations, so in that particular case the representation cannot change.

Conversion of signed integer to unsigned integers is well-defined in the standard, section 6.3.1.3. The relevant algorithm is the second paragraph:

  1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
  2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
  3. ...

So the result has to be, in effect, what a bit-for-bit copy would have resulted in had the negative number been stored in 2's-complement. A conforming implementation is allowed to use sign-magnitude or ones-complement; in both cases, the representation of negative integers will have to be modified to cast to unsigned.


Summarizing a lengthy and interesting discussion in the comments:

  • In the precise example in the OP, which uses int32_t and uint32_t, the representations must be equal if the program compiles, because C99 requires int32_t and uint32_t to be exactly 32 bits long with no padding, and requires int32_t to use 2's-complement representation. It does not, however, require those types to exist; a ones-complement implementation could simply not define int32_t, and still conform.

  • My interpretation of type-punning is below the horizontal rule. @R.. pointed us to a Defect Report from 2004 which seems to say that type-punning is either OK or fires a trap, which is closer to implementation-defined behaviour than undefined behaviour. On the other hand, the suggested resolution of that DR doesn't seem to be in the C11 document, which says (6.2.6.1(5)):

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

That seems to me to be saying that type-punning is undefined behaviour if one of the participating types has a trap representation (and consequently is not undefined behaviour if the reading type does not have a trap representation). On the other hand, no type is required to have a trap representation, and only a few types are prohibited from having one: char and union types -- but not members of union types --, as well as whichever of the [u]int*K_t types are implemented.

My previous statement on type-punning follows:


The storage-punning union has undefined behaviour. But without invoking lagartos voladores, it is somewhat expected that sign-magnitude or ones-complement machines may throw a hardware exception if a certain value is stored as unsigned and then accessed as signed.

Both ones-complement and sign-magnitude have two possible representations of 0, one with each popular sign bit. The one with a negative sign bit, "negative zero", is allowed to be a "trap value"; consequently, accessing the value (even just to copy it) as a signed integer could trigger the trap.

Although the C compiler would be within its rights to suppress the trap, say by copying the value with memcpy or an unsigned opcode, it is unlikely to do so because that would be surprising to a programmer who knew that her program was running on a machine with trapping negative zeros, and was expecting the trap to trigger in the case of an illegal value.

like image 89
rici Avatar answered Oct 13 '22 23:10

rici