Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can someone explain overflow in C# using binary?

I'm currently reading a book on C# programming and it has brief intro to Overflow and Underflow and the writer gives a general idea of what happens when you go over the allowed range of a certain type.

Example

short a = 30000;
short b = 30000;
short sum = (short)(a + b); // Explicitly cast back into short
Console.WriteLine(sum); // This would output the value -5536

So the short type only has a range of from -32768 to 32767 and in the book the explanation given by the writer is "For integer types (byte, short, int, and long), the most significant bits (which overflowed) get dropped. This is especially strange, as the computer then interprets it as wrapping around. This is why we end up with a negative value in our example. You can easily see this happening if you start with the maximum value for a particular type (for example, short.MaxValue) and adding one to it." The C# Players Guide Second edition, chapter 9, page 58

You'd end up at the minimum (-32768).

I'm having a problem understanding this I get confused when the writer talks about "the computer interprets it as wrapping around"

The way I tried understanding this was that the short type uses 2 bytes (16 bits)

So the number 32767 = 0111111111111111 if i was to +1 to the binary string I'd end up with 32768 = 1000000000000000 (which cannot be represented with the short type as max value is 32767) so the compiler gives -32768. Why does it end up as a negative?

I understand the concept of using twos compliment to represent negative numbers and could someone correct my thinking here or elaborate because i don't fully understand why we only use 15 bits of the 16 bits to represent the positive range and the most significant bit for the negative range

like image 691
Shabubble Avatar asked May 25 '16 23:05

Shabubble


People also ask

What is overflow example?

An example of an 8-bit overflow occurs in the binary sum 11111111 + 1 (denary: 255 + 1). The total is a number bigger than 8 digits, and when this happens the CPU drops the overflow digit because the computer cannot store it anywhere, and the computer thinks 255 + 1 = 0.

What is the meaning of overflow in programming?

In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximum or lower than the minimum representable value.

How do you know if its overflow or underflow?

Simply put, overflow and underflow happen when we assign a value that is out of range of the declared data type of the variable. If the (absolute) value is too big, we call it overflow, if the value is too small, we call it underflow.

How do you know if an integer is overflowing?

Write a “C” function, int addOvf(int* result, int a, int b) If there is no overflow, the function places the resultant = sum a+b in “result” and returns 0. Otherwise it returns -1. The solution of casting to long and adding to find detecting the overflow is not allowed.


1 Answers

Ignore everyone who is telling you that the top bit is a sign bit. That is the wrong way to think about it.

The right way to think about it is:

  • We have 65536 possible bit patterns.
  • Therefore we can represent 65536 possible numbers.
  • We must have a map that assigns meaning to each bit pattern.

For unsigned shorts, we assign bit patterns as follows:

0000000000000000 --> 0
0000000000000001 --> 1
...
0111111111111111 --> 32767
1000000000000000 --> 32768
1000000000000001 --> 32769
...
1111111111111111 --> 65535

For signed shorts, we use the following convention:

0000000000000000 --> 0
0000000000000001 --> 1
...
0111111111111111 --> 32767
1000000000000000 --> -32768
1000000000000001 --> -32767
...
1111111111111111 --> -1

Simple as that.

Why do we use this convention?

Three reasons:

(1) The first 32K values are the same whether you are signed or unsigned. That is very convenient.

(2) In both conventions, "all zero bits" means zero.

(3) Because addition works exactly the same in both conventions!

We have

0000000000000000 --> 0

and we want to add one. We add one using binary rules and we get:

0000000000000001 --> 1

This works whether the short is signed or unsigned.

We have unsigned short:

1000000000000000 --> 32768

We wish to add one. We do so using binary rules, and we get the right answer:

1000000000000001 --> 32769

Same for signed shorts. We have

1000000000000000 --> -32768

and we wish to add one. We do so with binary rules and we get:

1000000000000001 --> -32767

Similarly, you can verify that by adding 1111111111111111 to any binary number, you get "one less", and therefore subtraction of one works as well as addition of one. You can then go on to show that in general, addition and subtraction work the same in both signed and unsigned arithmetic, which means the processor doesn't have to know whether the compiler thinks the values are signed or unsigned.

That is why we use twos complement: because the underlying math is exactly the same whether you are doing signed or unsigned arithmetic.

Notice that I said nothing whatsoever about a "sign bit" in there. The fact that the high bit is set for negative numbers is just a nice bonus. The real property we want is to only have to build hardware to do math once.

Twos complement is just a convention for assigning one of two possible meanings to bit patterns based on the type associated with the variable holding that bit pattern. As an industry, we chose this convention because it is cheap to make high performance hardware that uses this convention.

There are a great many other conventions that we could have chosen. For example, we could have said that for signed numbers we use the map:

0000000000000000 --> -32768
0000000000000001 --> -32767
...
0111111111111111 --> -1
1000000000000000 --> 0
1000000000000001 --> 1
...
1111111111111111 --> 32767

Notice that this is exactly the same as before except for the top bit!

In this interpretation, addition still works as we expect. But this map does not have the property that the first 32K values are the same between short and ushort, and it does not have the property that all zero bits means zero. So we do not use this convention, because it is less convenient. In this convention, converting from short to ushort would require doing an addition. Setting a short to zero requires setting the bytes to something other than zero. We could use a convention where the signed numbers were "in order", we just don't because it is painful to do so.

like image 104
Eric Lippert Avatar answered Oct 12 '22 07:10

Eric Lippert