Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find missing integer in an array without using sum

I'm working through the hard exercises from the "Cracking the Coding Interview" book. One of the questions is the following:

17.4 Missing Number: An array A contains all the integers from 0 to n, except for one number which is missing. In this problem, we cannot access an entire integer in A with a single operation. The elements of A are represented in binary, and the only operation we can use to access them is "fetch the jth bit of A[i]", which takes constant time. Write code to find the missing integer. Can you do it in O(n) time?

The trivial answer to this is sum([1, n]) - sum(array). It can even be done bit by bit without using the sum function by simply XOR-ing the numbers. However, the book says:

The runtime of this solution is n * length(n), where length is the number of bits in n. Note that length(n) = log(n). So, the runtime is actually O(n log(n)). Not quite good enough!

Although, for all practical purposes, an integer is represented by a fixed number of bits in most programming languages (Java uses 32 bit), so length(n) is a constant, but interview questions are rarely practical, so, we toil on.

I quote the rest of the answer here, then, I'll ask my question.

So how else can we approach it?
We can actually use a similar approach, but leverage the bit values more directly.

Picture a list of binary numbers (the ----- indicates the value that was removed):

00000   00100   01000   01100
00001   00101   01001   01101
00010   00110   01010
-----   00111   01011

Removing the number above creates an imbalance of 1s and 0s in the least significant bit, which we'll call LSB1. In a list of numbers from 0 to n, we would expect there to be the same number of 0s and 1s (if n is odd), or an additional 0 if n is even. That is:

if n % 2 == 1 then count(0s) = count(1s)
if n % 2 == 0 then count(0s) = 1 + count(1s)

Note that this means that count(0s) is always greater than or equal to count(1s).

When we remove a value v from the list, we'll know immediately if v is even or odd just by looking at the least significant bits of all the other values in the list.

n % 2 == 0
count(0s) = 1 + count(1s)
n % 2 == 1
count(0s) = count(1s)
v % 2 == 0, LSB(v) = 0 a 0 is removed.
count(0s) = count(1s)
a 0 is removed.
count(0s) < count(1s)
v % 2 == 1, LSB(v) = 1 a 1 is removed.
count(0s) > count(1s)
a 1 is removed.
count(0s) > count(1s)

So, if count(0s) <= count(1s), then v is even. If count(0s) > count(1s), then v is odd.
We can now remove all the evens and focus on the odds, or remove all the odds and focus on the evens.

Okay, but how do we figure out what the next bit in v is? If v were contained in our (now smaller) list, then we should expect to find the following:
count(0s) = count(1s) OR count(0s) = 1 + count(1s)

As in the earlier example, we can deduce the value of the second least significant bit of v.

count(0s) = 1 + count(1s) count(0s) = count(1s)
LSB(v) == 0 a 0 is removed.
count(0s) = count(1s)
a 0 is removed.
count(0s) < count(1s)
LSB(v) == 1 a 1 is removed.
count(0s) > count(1s)
a 1 is removed.
count(0s) > count(1s)

Again, we have the same conclusion.
If count(0s) <= count(1s), then LSB2(v) = 0.
If count(0s) > count(1s), then LSB2(v) = 1.

We can repeat this process for each bit. On each iteration, we count the number of 0s and 1s in bit i to check if LSBi(v) is 0 or 1. Then, we discard the numbers where LSBi(x) != LSBi(v). That is, if v is even, we discard the odd numbers, and so on.

By the end of the process, we will have computed all bits in v. In each successive iteration, we look at n, then n/2, then n/4, and so on, bits. This results in a runtime of O(n).

I get that count(0s) >= count(1s) is established by observation for numbers from 0 to n. However, once half the numbers are removed, how does count(0s) >= count(1s) still hold true?

The algorithm, obviously, depends on it, and uses it to reduce the problem size in half at each iteration.

like image 697
Abhijit Sarkar Avatar asked Sep 17 '25 05:09

Abhijit Sarkar


1 Answers

One way to understand why num(0s) >= num(1s) still holds true is to look at the two operations performed by this algorithm at each step.

At each step, this algorithm does 2 things:

  1. delete odd numbers OR delete even numbers (depending on the sum on the column)
  2. remove least significant bit

Now let's look what it means on the numbers. Let's write N numbers ordered:

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
...

You notice that the sequence of least significant bit (the last column) is 0101010101.... Let's do the first operation (for instance let's choose to delete odd numbers), we get:

0000
0010
0100
0110
1000
1010
1100
1110
...

Now let's do the second operation:

000
001
010
011
100
101
110
111
...
  • You realize that the new sequence of least significant bit becomes: 01010101010101... which is exactly the same sequence we had at the beginning of the step. It will satisfy as well: num(0s) >= num(1s)

  • You can also notice that if you did choose to remove even numbers instead of odds, it would give you the exact same sequence.

  • You can notice that if you repeat this operation, it will be again this sequence and so on (you can demonstrate that mathematically if you want by assuming Hn and then proving Hn+1).

Note: I am not sure this algorithm is actually better than the trivial one, even in theory: In computer science (if i remember well my engineering lessons), the complexity is defined by the number of operation a Turing machine has to do to complete an algorithm. If n was extremely big, let's say 2 power 100 billion: then each number contains 100 billion digits. But a Turing machine contains a single band where everything is written. So the head of the Turing machine needs to move through each digit (for instance to move from first number to third one because we removed the second one, we need to reach the third one, so we need to pass at least once through each digit of second number). I am not sure of this last part.

like image 126
Vince M Avatar answered Sep 19 '25 06:09

Vince M