The situation is the following:
Now on a 64bit machine, which statement is correct (if any at all):
Say that the signed binary integer 11111111001101100000101011001000 is simply negative due to an overflow. This is a practical existing problem since you might want to allocate more bytes than you can describe in a 32bit integer. But then it gets read in as a 64bit integer.
Malloc
reads this as a 64bit integer, finding 11111111001101100000101011001000################################
with # being a wildcard bit representing whatever data is stored after the original integer. In other words, it read a result close to its maximum value 2^64 and tries to allocate some quintillion bytes. It fails.Malloc
reads this as a 64bit integer, casting to 0000000000000000000000000000000011111111001101100000101011001000
, possibly because it is how it is loaded into a register leaving a lot of bits zero. It does not fail but allocates the negative memory as if reading a positive unsigned value.Malloc
reads this as a 64bit integer, casting to ################################11111111001101100000101011001000
, possibly because it is how it is loaded into a register with # a wildcard representing whatever data was previously in the register. It fails quite unpredictably depending on the last value.I actually tested this, resulting in the malloc failing (which would imply either 1 or 3 to be correct). I assume 1 is the most logical answer. I also know the fix (using size_t as input instead of int).
I'd just really want to know what actually happens. For some reason I don't find any clarification on how 32bit integers are actually treated on 64bit machines for such an unexpected 'cast'. I'm not even sure if it being in a register actually matters.
An integer overflow can cause the value to wrap and become negative, which violates the program's assumption and may lead to unexpected behavior (for example, 8-bit integer addition of 127 + 1 results in −128, a two's complement of 128).
An integer overflow can lead to data corruption, unexpected behavior, infinite loops and system crashes.
When an integer is created, the computer allocates 32-bits to store its value. However, there might be data that is larger than 32- bits; for example, a sextillion (a billion trillion) is 70 digits in binary. When an integer value is larger than 32- bits, an integer overflow occurs.
If a program performs a calculation and the true answer is larger than the available space, it may result in an integer overflow. These integer overflows can cause the program to use incorrect numbers and respond in unintended ways, which can then be exploited by attackers.
The problem with your reasoning, is that it starts with the assumption that the integer overflow will result in a deterministic and predictable operation.
This, unfortunately, is not the case: undefined behavior means that anything can happen, and notably that compilers may optimize as if it could never happen.
As a result, it is nigh impossible to predict what kind of program the compiler will produce if there is such a possible overflow.
0
to size_t(-1)
and thus may allocate either too few or too much memory, or even fail to allocate, ...Undefined Behavior => All Bets Are Off
Once an integer overflows, using its value results in undefined behavior. A program that uses the result of an int
after the overflow is invalid according to the standard -- essentially, all bets about its behavior are off.
With this in mind, let's look at what's going to happen on a computer where negative numbers are stored in two's complement representation. When you add two large 32-bit integers on such a computer, you get a negative result in case of an overflow.
However, according to C++ standard, the type of malloc
's argument, i.e. size_t
, is always unsigned. When you convert a negative number to an unsigned number, it gets sign-extended (see this answer for a discussion and a reference to the standard), meaning that the most significant bit of the original (which is 1
for all negative numbers) is set in the top 32 bits of the unsigned result.
Therefore, what you get is a modified version of your third case, except that instead of "wildcard bit #
" it has ones all the way to the top. The result is a gigantic unsigned number (roughly 16 exbibytes or so); naturally malloc
fails to allocate that much memory.
So if we have a specific code example, a specific compiler and platform we can probably determine what the compiler is doing. Which is the approach taken in Deep C but even then it may not be fully predictable which is a hallmark of undefined behavior, generalizing about undefined behavior is not a good idea.
We only have to take a look at the advice from the gcc
documentation to see how messy it can get. The documentation offers some good advice on integer overflow, which says:
In practice many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors expected.
and in the sub-section Practical Advice for Signed Overflow Issues says:
Ideally the safest approach is to avoid signed integer overflow entirely.[...]
At the end of the day it is undefined behavior and therefore unpredictable in the general case but in the case of gcc
, in their implementation defined section on Integer says that integer overflow wraps around:
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
but in their advice about integer overflow they explain how optimization can cause problems with wraparound:
Compilers sometimes generate code that is incompatible with wraparound integer arithmetic.
So this quickly gets complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With