Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unsigned and Signed Extension

Can someone explain the following code output to me:

void myprint(unsigned long a)
{
    printf("Input is %lx\n", a);
}
int main()
{
    myprint(1 << 31);
    myprint(0x80000000);
}

output with gcc main.c :

Input is ffffffff80000000
Input is 80000000

Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned?

like image 266
Saksham Jain Avatar asked May 02 '16 11:05

Saksham Jain


People also ask

What is signed and unsigned addition?

Variables such as integers can be represent in two ways, i.e., signed and unsigned. Signed numbers use sign flag or can be distinguish between negative values and positive values. Whereas unsigned numbers stored only positive numbers but not negative numbers.

What is the difference between signed and unsigned decimal?

The "signed" indicator means that the item can hold positive or negative values. "Unsigned" doesn't distinguish between positive and negative values. A signed/unsigned variable can refer to any numerical data type (such as binary, integer, float, etc).

How do you extend a signed binary?

Recall: to increase the number of bits in a representation of an integer in two's complement, add copies of the leftmost bit (the sign bit) to the left until you have the desired number of bits. This is called sign extension.

What is Sign_extend in C?

Sign-extending means copying the sign bit of the unextended value to all bits on the left side of the larger-size value.


2 Answers

In C the result of an expression depends on the types of the operands (or some of the operands). Particularly, 1 is an int (signed), therefore 1 << n is also int.

The type (including signed-ness) of 0x80000000 is determined by the rules here and it depends on the size of int and other integer types on your system, which you haven't specified. A type is chosen such that 0x80000000 (a large positive number) is in range for that type.

In case you have any misconception: the literal 0x80000000 is a large positive number. People sometimes mistakenly equate it to a negative number, mixing up values with representations.

In your question you say "Why is 0x80000000 is treated as unsigned?". However your code does not actually rely on the signed-ness of 0x80000000. The only thing you do with it is pass it to the function which takes unsigned long parameter. So whether or not it is signed or unsigned doesn't matter; when passed to the conversion it is converted to an unsigned long with the same value. (Since 0x80000000 is within the minimum guaranteed range for unsigned long there is no chance of it being out of range).

So, that's 0x80000000 dealt with. What about 1 << 31 ? If your system has 32-bit int (or narrower) this causes undefined behaviour due to signed arithmetic overflow. (Link to further reading). If your system has larger ints then this will produce the same output as the 0x80000000 line.

If you use 1u << 31 instead, and you have 32-bit ints, then there is no undefined behaviour and you are guaranteed to see the program output 80000000 twice.

Since your output was not 80000000 then we can conclude that your system has 32-bit (or narrower) int, and your program actually causes undefined behaviour. The type of 0x80000000 would be unsigned int if int is 32-bit, or unsigned long otherwise.

like image 196
M.M Avatar answered Nov 15 '22 23:11

M.M


Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned?

From 6.5.7 Bitise shift operators in C11 specs:

3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. [...]
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined

So, because 1 is an int (From section 6.4.4.1 mentioned in following paragraph), 1 << 31 is also an int for which the value is not well defined on systems where int is less than or equal to 32 bits. (May even trap)


From 6.4.4.1 Integer constants

3 A decimal constant begins with a nonzero digit and consists of a sequence of decimal digits. An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only. A hexadecimal constant consists of the prefix 0x or 0X followed by a sequence of the decimal digits and the letters a (or A) through f (or F) with values 10 through 15 respectively.

and

5 The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix   |           decimal Constant         |   Hex Constant
---------+------------------------------------+---------------------------
none     |       int                          |  int
         |       int                          |  unsigned int
         |                                    |  long int
         |       long int                     |  unsigned long int
         |                                    |  long long int
         |       long long int                |  unsigned long long int
---------+------------------------------------+---------------------------
u or U   |       unsigned int                 |  unsigned int
[...]    |       [...]                        |  [...]

So, 0x80000000 on a system with 32 bit or lesser bits int and 32 bit or larger unsigned int is an unsigned int,

like image 21
Mohit Jain Avatar answered Nov 15 '22 23:11

Mohit Jain