Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding missing number using binary search

I am reading book on programming pearls.

Question: Given a sequential file that contains at most four billion 32 bit integers in random order, find a 32-bit integer that isn't in the file (and there must be at least one missing). This problem has to be solved if we have a few hundred bytes of main memory and several sequential files.

Solution: To set this up as a binary search we have to define a range, a representation for the elements within the range, and a probing method to determine which half of a range holds the missing integer. How do we do this?

We'll use as the range a sequence of integers known to contain atleast one missing element, and we'll represent the range by a file containing all the integers in it. The insight is that we can probe a range by counting the elements above and below its midpoint: either the upper or the lower range has atmost half elements in the total range. Because the total range has a missing element, the smaller half must also have a mising element. These are most ingredients of a binary search algorithm for above problem.

Above text is copy right of Jon Bently from programming pearls book.

Some info is provided at following link

"Programming Pearls" binary search help

How do we search by passes using binary search and also not followed with the example given in above link? Please help me understand logic with just 5 integers rather than million integers to understand logic.

like image 435
venkysmarty Avatar asked Sep 07 '12 08:09

venkysmarty


2 Answers

Why don't you re-read the answer in the post "Programming Pearls" binary search help. It explains the process on 5 integers as you ask.
The idea is that you parse each list and break it into 2 (this is where binary part comes from) separate lists based on the value in the first bit.

I.e. showing binary representation of actual numbers Original List "": 001, 010, 110, 000, 100, 011, 101 => (broken into)
(we remove the first bit and append it to the "name" of the new list)
To form each of the bellow lists we took values starting with [0 or 1] from the list above
List "0": 01, 10, 00, 11 (is formed from subset 001, 010, 000, 011 of List "" by removing the first bit and appending it to the "name" of the new list)
List "1": 10, 00, 01 (is formed from subset 110, 100, 101 of List "" by removing the first bit and appending it to the "name" of the new list)

Now take one of the resulting lists in turn and repeat the process:
List "0" becomes your original list and you break it into
List "0***0**" and
List "0***1**" (the bold numbers are again the 1 [remaining] bit of the numbers in the list being broken)

Carry on until you end up with the empty list(s).

EDIT
Process step by step:
List "": 001, 010, 110, 000, 100, 011, 101 =>
List "0": 01, 10, 00, 11 (from subset 001, 010, 000, 011 of the List "") =>
List "00": 1, 0 (from subset 01, 00 of the List "0") =>
List "000": 0 [final result] (from subset 0 of the List "00")
List "001": 1 [final result] (from subset 1 of the List "00")
List "01": 0, 1 (from subset 10, 11 of the List "0") =>
List "010": 0 [final result] (from subset 0 of the List "01")
List "011": 1 [final result] (from subset 1 of the List "01")
List "1": 10, 00, 01 (from subset 110, 100, 101 of the List "") =>
List "10": 0, 1 (from subset 00, 01 of the List "1") =>
List "100": 0 [final result] (from subset 0 of the List "10")
List "101": 1 [final result] (from subset 1 of the List "10")
List "11": 0 (from subset 10 of the List "1") =>
List "110": 0 [final result] (from subset 0 of the List "11")
List "111": absent [final result] (from subset EMPTY of the List "11")

The positive of this method is that it will allow you to find ANY number of missing numbers in the set - i.e. if more than one is missing.

P.S. AFAIR for 1 single missing number out of the complete range there is even more elegant solution of XOR all numbers.

like image 86
Germann Arlington Avatar answered Sep 28 '22 04:09

Germann Arlington


The idea is to solve easier problem:

Is the missing value in range [minVal, X] or (X, maxVal). If you know this, you can move X and check again.

For example, you have 3, 4, 1, 5 (2 is missing). You know that minVal = 1, maxVal = 5.

  1. Range = [1, 5], X = 3, there should be 3 integers in range [1, 3] and 2 in range [4, 5]. There are only 2 in range [1, 3], so you are looking in range [1, 3]
  2. Range = [1, 3], X = 2. There are only 1 value in range [1, 2], so you are looking in range [1, 2]
  3. Range = [1, 2], X = 1. There are no values in range [2, 2] so it is your answer.

EDIT: Some pseudo-C++ code:

minVal = 1, maxVal = 5; //choose correct values
while(minVal < maxVal){
    int X = (minVal + maxVal) / 2
    int leftNumber = how much in range [minVal, X]
    int rightNumber = how much in range [X + 1, maxVal]
    if(leftNumber < (X - minVal + 1))maxVal = X
    else minVal = X + 1
}
like image 32
Ari Avatar answered Sep 28 '22 04:09

Ari