Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrap around explanation for signed and unsigned variables in C?

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.

My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!

What wrap around is and how is it different from just truncating bits.

like image 394
orustammanapov Avatar asked Nov 07 '13 17:11

orustammanapov


People also ask

What is wrapping around in C?

An integer overflow or wraparound occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may wrap to become a very small or negative number.

What is signed and unsigned variable in C?

The int type in C is a signed integer, which means it can represent both negative and positive numbers. This is in contrast to an unsigned integer (which can be used by declaring a variable unsigned int), which can only represent positive numbers.

Do unsigned ints wrap around?

Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.

How are signed and unsigned integers stored in C explain with an example?

Variables such as integers can be represent in two ways, i.e., signed and unsigned. Signed numbers use sign flag or can be distinguish between negative values and positive values. Whereas unsigned numbers stored only positive numbers but not negative numbers.


2 Answers

Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.

For example, GCC compiler can assume that for variable int i the following condition

if (i > 0 && i + 1 > 0)

is equivalent to a mere

if (i > 0)

This is exactly what strict overflow semantics means.

Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.

However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.

For example, this code produces a wrap around

unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */

while this code

unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */

leads to undefined behavior.

If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.

like image 111
AnT Avatar answered Oct 04 '22 03:10

AnT


Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).

This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.

Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -0
101  = -1
110  = -2
111  = -3

Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.

One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -3
101  = -2
110  = -1 
111  = -0

I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.

Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -4
101  = -3
110  = -2
111  = -1

If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.

Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.

like image 26
John Bode Avatar answered Oct 04 '22 02:10

John Bode