Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unsigned vs signed range guarantees

Tags:

c++

c

I've spent some time poring over the standard references, but I've not been able to find an answer to the following:

  • is it technically guaranteed by the C/C++ standard that, given a signed integral type S and its unsigned counterpart U, the absolute value of each possible S is always less than or equal to the maximum value of U?

The closest I've gotten is from section 6.2.6.2 of the C99 standard (the wording of the C++ is more arcane to me, I assume they are equivalent on this):

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. (...) Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and Nin the unsigned type, then M≤N).

So, in hypothetical 4-bit signed/unsigned integer types, is anything preventing the unsigned type to have 1 padding bit and 3 value bits, and the signed type having 3 value bits and 1 sign bit? In such a case the range of unsigned would be [0,7] and for signed it would be [-8,7] (assuming two's complement).

In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.

EDIT: thanks for the replies below Keith and Potatoswatter. Now, my last point of doubt is on the meaning of "subrange" in the wording of the standard. If it means a strictly "less-than" inclusion, then my example above and Keith's below are not standard-compliant. If the subrange is intended to be potentially the whole range of unsigned, then they are.

like image 237
bluescarni Avatar asked Jul 07 '12 03:07

bluescarni


2 Answers

For C, the answer is no, there is no such guarantee.

I'll discuss types int and unsigned int; this applies equally to any corresponding pair of signed and unsigned types (other than char and unsigned char, neither of which can have padding bits).

The standard, in the section you quoted, implicitly guarantees that UINT_MAX >= INT_MAX, which means that every non-negative int value can be represented as an unsigned int.

But the following would be perfectly legal (I'll use ** to denote exponentiation):

CHAR_BIT == 8
sizeof (int) == 4
sizeof (unsigned int) == 4
INT_MIN  = -2**31
INT_MAX  = +2**31-1
UINT_MAX = +2**31-1

This implies that int has 1 sign bit (as it must) and 31 value bits, an ordinary 2's-complement representation, and unsigned int has 31 value bits and one padding bit. unsigned int representations with that padding bit set might either be trap representations, or extra representations of values with the padding bit unset.

This might be appropriate for a machine with support for 2's-complement signed arithmetic, but poor support for unsigned arithmetic.

Given these characteristics, -INT_MIN (the mathematical value) is outside the range of unsigned int.

On the other hand, I seriously doubt that there are any modern systems like this. Padding bits are permitted by the standard, but are very rare, and I don't expect them to become any more common.

You might consider adding something like this:

#if -INT_MIN > UINT_MAX
#error "Nope"
#endif

to your source, so it will compile only if you can do what you want. (You should think of a better error message than "Nope", of course.)

like image 200
Keith Thompson Avatar answered Sep 28 '22 00:09

Keith Thompson


You got it. In C++11 the wording is more clear. §3.9.1/3:

The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.

But, what really is the significance of the connection between the two corresponding types? They are the same size, but that doesn't matter if you just have local variables.

In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.

You need to deal with whatever numeric ranges the machine supports. Instead of casting to the unsigned counterpart, cast to whatever unsigned type is sufficient: one larger than the counterpart if necessary. If no large enough type exists, then the machine may be incapable of doing what you want.

like image 41
Potatoswatter Avatar answered Sep 27 '22 23:09

Potatoswatter