I heard that, when computing mean value, start+(end-start)/2 differs from (start+end)/2 because the latter can cause overflow. I do not quite understand why this second one can cause overflow while the first one does not. What are the generic rule to implement a math formula that can avoid overflow.
Suppose you are using a computer where the maximum integer value is 10 and you want to compute the average of 5 and 7.
The first method (begin + (end-begin)/2) gives
5 + (7-5)/2 == 5 + 2/2 == 6
The second method (begin + end)/2 gives an overflow, since the intermediate 12 value is over the maximum value of 10 that we accept and "wraps over" to something else (if you are using unsigned numbers its usual to wrap back to zero but if your numbers are signed you could get a negative number!).
12/2 => overflow occurs => 2/2 == 1
Of course, in real computers integers overflow at a large value like 2^32 instead of 10, but the idea is the same. Unfortunately, there is no "general" way to get rid of overflow that I know of, and it greatly depends on what particular algorithm you are using. And event then, things get more complicated. You can get different behaviour depending on what number type you are using under the hood and there are other kinds of numerical errors to worry about in addition to over and underflow.
Both your formulas will overflow, but under different circumstances:
(start+end)
part of your (start+end)/2
formula will overflow when start
and end
are both close to the integer limit on the same side of the range (i.e. both positive or both negative).(end-start)
part of your start+(end-start)/2
formula will overflow when start
is positive and end
is negative, and both values are close to the respective ends of the representable integer values.There are no "generic" rules, you do it case-by-case: look at parts of your formula, think of situations that could cause overflow, and come up with ways to avoid it. For example, the start+(end-start)/2
formula can be shown to avoid overflow when you average values with the same sign.
This is the hard way; the easy way is to use higher-capacity representations for intermediate results. For example, if you use long long
instead of int
to do intermediate calculations and copy the results back to int
only when you are done, you will avoid overflow assuming that the end result fits in an int
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With