Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safe integer middle value formula

I am looking for an efficient formula working in Java which calculates the following expression:

(low + high) / 2

which is used for binary search. So far, I have been using "low + (high - low) / 2" and "high - (high - low) / 2" to avoid overflow and underflows in some cases, but not both. Now I am looking for an efficient way to do this, which would for for any integer (assuming integers range from -MAX_INT - 1 to MAX_INT).

UPDATE: Combining the answers from Jander and Peter G. and experimenting a while I got the following formulas for middle value element and its immediate neighbors:

Lowest-midpoint (equal to floor((low + high)/2), e.g. [2 3] -> 2, [2 4] -> 3, [-3 -2] -> -3)

mid = (low & high) + ((low ^ high) >> 1);

Highest-midpoint (equal to ceil((low + high)/2), e.g. [2 3] -> 3, [2 4] -> 3, [-3 -2] -> -2)

low++;
mid = (low & high) + ((low ^ high) >> 1);

Before-midpoint (equal to floor((low + high - 1)/2)), e.g. [2 3] -> 2, [2 4] -> 2, [-7 -3] -> -6)

high--;
mid = (low & high) + ((low ^ high) >> 1);

After-midpoint (equal to ceil((low + high + 1)/2)), e.g. [2 3] -> 3, [2 4] -> 4, [-7 -3] -> -4)

mid = (low & high) + ((low ^ high) >> 1) + 1;

Or, without bitwise and (&) and or (|), slightly slower code (x >> 1 can be replaced with floor(x / 2) to obtain bitwise operator free formulas):

Leftmost-midpoint

halfLow = (low >> 1), halfHigh = (high >> 1);
mid = halfLow + halfHigh + ((low-2*halfLow + high-2*halfHigh) >> 1);

Rightmost-midpoint

low++
halfLow = (low >> 1), halfHigh = (high >> 1);
mid = halfLow + halfHigh + ((low-2*halfLow + high-2*halfHigh) >> 1);

Before-midpoint

high--;
halfLow = (low >> 1), halfHigh = (high >> 1);
mid = halfLow + halfHigh + ((low-2*halfLow + high-2*halfHigh) >> 1);

After-midpoint

halfLow = (low >> 1), halfHigh = (high >> 1);
mid = halfLow + halfHigh + ((low-2*halfLow + high-2*halfHigh) >> 1) + 1;

Note: the above >> operator is considered to be signed shift.

like image 521
eold Avatar asked Jan 30 '11 17:01

eold


People also ask

What is formula for mid in binary search?

int mid = low + ((high - low)/2); When #elements = odd, we have only 1 mid. So we can use the above formula to compute mid.

How can binary search prevent overflow?

Unless you are using a language that does not overflow such as Python, l+r could overflow. One way to fix this is to use m=l+(r-l)/2 ​ instead so that the program will not fall under overflow bug. If you fall into this subtle overflow bug, you are not alone.


3 Answers

From http://aggregate.org/MAGIC/#Average%20of%20Integers:

(low & high) + ((low ^ high) / 2)

is an overflow-proof average of two unsigned integers.

Now, this trick only works on unsigned integers. But because ((a+x) + (b+x))/2 = (a+b)/2 + x, you can fudge it as follows, if you have unsigned integers with the same bit size as your signed integers:

unsigned int u_low  = low + MAX_INT + 1;
unsigned int u_high = high + MAX_INT + 1;
unsigned int u_avg  = (u_low & u_high) + (u_low ^ u_high)/2;
int avg = u_avg - MAX_INT - 1;

UPDATE: On further thought, this will work even if you don't have signed integers. Signed and unsigned integers are equivalent over addition, subtraction, and bitwise operations. So all we need to worry about is making sure that divide acts like an unsigned divide, which we can do by using a shift and masking out the uppermost bit.

low += MAX_INT + 1;
high += MAX_INT + 1;
avg = (low & high) + (((low ^ high) >> 1) & MAX_INT);
avg -= MAX_INT + 1;

(Note that if you're using Java, you can use an unsigned shift, ... >>> 1, instead of (... >> 1) & MAX_INT.)

HOWEVER, there's an alternative I stumbled upon that's even simpler, and I haven't yet figured out how it works. There's no need to adjust the numbers by MAX_INT or use unsigned variables or anything. It's simply:

avg = (low & high) + ((low ^ high) >> 1);

Tested with all combinations of 16-bit signed integers low and high in the range -32768..32767, but not yet proven outright (by me anyway).

like image 189
Jander Avatar answered Oct 05 '22 23:10

Jander


int half_low = low/2;
int lsb_low = low - 2*half_low;
int half_high = high/2;
int lsb_high = high - 2*half_high;
int mean = half_low + half_high + (lsb_low + lsb_high)/2;
like image 29
Peter G. Avatar answered Oct 05 '22 22:10

Peter G.


Assuming high >= low, a variant of your initial approach should also work, that is:

low + ((high - low) >>> 1)

where >>> is an unsigned shift (as in Java).

The idea is that high - low never overflows if the result is interpreted as an unsigned integer, so the unsigned shift correctly performs division by 2 and the formula computes the middle value.

like image 37
Lorenzo Castelli Avatar answered Oct 05 '22 23:10

Lorenzo Castelli