Find the duplicate element in an array of consecutive integers in O(log n) time

Question

A couple of days ago, in an interview I was asked for a program that would find the single duplicate element in a consecutive integer array in O(log n) time.

The case is somewhat specific, as there are a total of 11 integers (1 to 10, in that order) plus, a single duplicate of any of these numbers, inserted somewhere in between.

I was given a sample similar to this:

{1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

So, now I've come up with the following C code:

#include <stdio.h>

int main (void)
{
    int a[11] = {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}, first=0, last=10, mid;

    while (1)
    {
        mid = (first + last)/2;

        if (mid+1 == a[mid])
            first = mid+1;

        else if ((mid == 0) || (mid == 10) || (a[mid+1] - a[mid-1] == 1))   /* Order is Important, (a[mid+1] - a[mid-1] == 1) must be last */
            break;

        else
            last = mid-1;
    }

    printf("Duplicate element is %d, located at position no. %d.", a[mid], mid+1);

    return 0;
}

Would this properly satisfy the O(log n) criteria? And, are there any alternatives/improvements over this?

kraskevich · Accepted Answer

Yes, it has O(log n) time complexity.

It is possible to make the code more clear using the following fact: you need to find the smallest i such that a[i] != i + 1, so it can be implemented in a more concise way:

//invariant: the [0...low] prefix does not contain a duplicate
//           the [0...high] prefix contains a duplicate
low = 0 //[0...0] prefix obviously does not contain a duplicate
high = 10 //the entire array obviously contains a duplicate
while high - low > 1:
    mid = (low + high) / 2 
    if a[mid] != mid + 1:
        high = mid
    else:
        low = mid
print(a[high], high)

Amit Sharma · Answer

We can modify the binary search algorithm to get the solution. In binary search, we have a key and we used this key to find its position by bisecting the array size. Here, we don't have a key, instead we have to find it. But the behavior of duplicate element can be used to bisect the array size. How ? lets see:

On carefully looking the data, we can easily see that after inserting the duplicate element at random position (say index) in consecutive element array the property of elements will change (a[i] == i+1 --> a[i] != i+1) from the position index (including the index). Now this changed property can be used to bisect the array size. Hence, we can find out the duplicate in O(log(n)).

For example, Consider your given array: {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

{1,  2,  3,   4,  5,  6,  2,  7,  8,  9,  10}
                          ||
                          from this position the the property of (a[i] == i+1) will no more satisfied.

This property can be model to bisect the array size in solution.

void  binary_duplictae_finder(int a[], int low, int high) {

   int mid=(low+high)/2;

   if(high - low > 1){
          if(a[mid]!=mid+1)
              binary_duplictae_finder(a, low, mid);
          else
              binary_duplictae_finder(a, mid, high);
   }

   if(high==low+1)
      printf("%d ", a[high]);
 }

Find the duplicate element in an array of consecutive integers in O(log n) time

Tags:

arrays

c

algorithm

optimization

duplicates

primekrieger

2 Answers

kraskevich

Amit Sharma

Recent Activity

Donate For Us

Find the duplicate element in an array of consecutive integers in O(log n) time

Tags:

arrays

c

algorithm

optimization

duplicates

primekrieger

2 Answers

kraskevich

Amit Sharma

Related questions

Recent Activity

Donate For Us