Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C standard on negative zero (1's complement and signed magnitude)

All of these functions gives the expected result on my machine. Do they all work on other platforms?

More specifically, if x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?

Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?

Assume sizeof(int) = 4 and CHAR_BIT = 8

int logicalrightshift_v1 (int x, int n) {

    return (unsigned)x >> n;
}

int logicalrightshift_v2 (int x, int n) {

    int msb = 0x4000000 << 1;
    return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v2a (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v3 (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (x < 0 ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v4 (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (((unsigned)x & 0x80000000) >> n);
}

int logicalrightshift_v5 (int x, int n) {

    unsigned y;
    *(int *)&y = x;
    y >>= n;
    *(unsigned *)&x = y;
    return x;
}

int logicalrightshift_v6 (int x, int n) {

    unsigned y;
    memcpy (&y, &x, sizeof (x));
    y >>= n;
    memcpy (&x, &y, sizeof (x));
    return x;
}
like image 702
tyty Avatar asked Oct 28 '11 05:10

tyty


2 Answers

If x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?

The conversion to unsigned is specified in terms of values, not representations. If you convert -1 to unsigned, you always get UINT_MAX (so if your unsigned is 32 bits, you always get 4294967295). This happens regardless of the representation of signed numbers that your implementation uses.

Likewise, if you convert -0 to unsigned then you always get 0. -0 is numerically equal to 0.

Note that a ones complement or sign-magnitude implementation is not required to support negative zeroes; if it does not, then accessing such a representation causes the program to have undefined behaviour.

Going through your functions one-by-one:

int logicalrightshift_v1(int x, int n)
{
    return (unsigned)x >> n;
}

The result of this function for negative values of x will depend on UINT_MAX, and will further be implementation-defined if (unsigned)x >> n is not within the range of int. For example, logicalrightshift_v1(-1, 1) will return the value UINT_MAX / 2 regardless of what representation the machine uses for signed numbers.

int logicalrightshift_v2(int x, int n)
{
    int msb = 0x4000000 << 1;
    return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}

Almost everything about this is could be implementation-defined. Assuming that you are attempting to create a value in msb with 1 in the sign bit and zeroes in the value bits, you cannot do this portably by use of shifts - you can use ~INT_MAX, but this is allowed to have undefined behaviour on a sign-magnitude machine that does not allow negative zeroes, and is allowed to give an implementation-defined result on two's complement machines.

The types of 0x7fffffff and 0x80000000 will depend on the ranges of the various types, which will affect how other values in this expression are promoted.

int logicalrightshift_v2a(int x, int n)
{
    return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}

If you create an unsigned value that is not in the range of int (for example, given a 32bit int, values > 0x7fffffff) then the implicit conversion in the return statement produces an implementation-defined value. The same applies to v3 and v4.

int logicalrightshift_v5(int x, int n)
{
    unsigned y;
    *(int *)&y = x;
    y >>= n;
    *(unsigned *)&x = y;
    return x;
}

This is still implementation defined, because it is unspecified whether the sign bit in the representation of int corresponds to a value bit or a padding bit in the representation of unsigned. If it corresponds to a padding bit it could be a trap representation, in which case the behaviour is undefined.

int logicalrightshift_v6(int x, int n)
{
    unsigned y;
    memcpy (&y, &x, sizeof (x));
    y >>= n;
    memcpy (&x, &y, sizeof (x));
    return x;
}

The same comments applying to v5 apply to this.

Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?

It depends. As a hex constant, 0x80000000 will have type int if that value is within the range of int; otherwise unsigned if that value is within the range of unsigned; otherwise long if that value is within the range of long; otherwise unsigned long (because that value is within the minimum allowed range of unsigned long).

If you wish to ensure that it has unsigned type, then suffix the constant with a U, to 0x80000000U.


Summary:

  1. Converting a number greater than INT_MAX to int gives an implementation-defined result (or indeed, allows an implementation-defined signal to be raised).

  2. Converting an out-of-range number to unsigned is done by repeated addition or subtraction of UINT_MAX + 1, which means it depends on the mathematical value, not the representation.

  3. Inspecting a negative int representation as unsigned is not portable (positive int representations are OK, though).

  4. Generating a negative zero through use of bitwise operators and trying to use the resulting value is not portable.

If you want "logical shifts", then you should be using unsigned types everywhere. The signed types are designed for dealing with algorithms where the value is what matters, not the representation.

like image 125
caf Avatar answered Oct 09 '22 07:10

caf


If you follow the standard to the word, none of these are guaranteed to be the same on all platforms.

In v5, you violate strict-aliasing, which is undefined behavior.

In v2 - v4, you have signed right-shift, which is implementation defined. (see comments for more details)

In v1, you have signed to unsigned cast, which is implementation defined when the number is out of range.

EDIT:

v6 might actually work given the following assumptions:

  • 'int' is either 2's or 1's complement.
  • unsigned and int are exactly the same size (in both bytes and bits, and are densely packed).
  • The endian of unsigned matches that of int.
  • The padding and bit-layout is the same: (See caf's comment for more details.)
like image 35
Mysticial Avatar answered Oct 09 '22 08:10

Mysticial