Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicate element in array in time O(n)

I have been asked this question in a job interview and I have been wondering about the right answer.

You have an array of numbers from 0 to n-1, one of the numbers is removed, and replaced with a number already in the array which makes a duplicate of that number. How can we detect this duplicate in time O(n)?

For example, an array of 4,1,2,3 would become 4,1,2,2.

The easy solution of time O(n2) is to use a nested loop to look for the duplicate of each element.

like image 568
Marwan Tushyeh Avatar asked Feb 18 '13 20:02

Marwan Tushyeh


People also ask

What is the time complexity of finding duplicates in an array?

Now navigate the array and check the adjacent elements to check for duplicates. Time Complexity : O(nlogn) Space Complexity: O(n) by using merge sort.

What is the time complexity for counting duplicates of an integer in unsorted integer array of size n?

So the time complexity is O(n).


2 Answers

This can be done in O(n) time and O(1) space.

(The algorithm only works because the numbers are consecutive integers in a known range):

In a single pass through the vector, compute the sum of all the numbers, and the sum of the squares of all the numbers.

Subtract the sum of all the numbers from N(N-1)/2. Call this A.

Subtract the sum of the squares from N(N-1)(2N-1)/6. Divide this by A. Call the result B.

The number which was removed is (B + A)/2 and the number it was replaced with is (B - A)/2.

Example:

The vector is [0, 1, 1, 2, 3, 5]:

  • N = 6

  • Sum of the vector is 0 + 1 + 1 + 2 + 3 + 5 = 12. N(N-1)/2 is 15. A = 3.

  • Sum of the squares is 0 + 1 + 1 + 4 + 9 + 25 = 40. N(N-1)(2N-1)/6 is 55. B = (55 - 40)/A = 5.

  • The number which was removed is (5 + 3) / 2 = 4.

  • The number it was replaced by is (5 - 3) / 2 = 1.

Why it works:

  • The sum of the original vector [0, ..., N-1] is N(N-1)/2. Suppose the value a was removed and replaced by b. Now the sum of the modified vector will be N(N-1)/2 + b - a. If we subtract the sum of the modified vector from N(N-1)/2 we get a - b. So A = a - b.

  • Similarly, the sum of the squares of the original vector is N(N-1)(2N-1)/6. The sum of the squares of the modified vector is N(N-1)(2N-1)/6 + b2 - a2. Subtracting the sum of the squares of the modified vector from the original sum gives a2 - b2, which is the same as (a+b)(a-b). So if we divide it by a - b (i.e., A), we get B = a + b.

  • Now B + A = a + b + a - b = 2a and B - A = a + b - (a - b) = 2b.

like image 65
rici Avatar answered Oct 06 '22 01:10

rici


We have the original array int A[N]; Create a second array bool B[N] too, of type bool=false. Iterate the first array and set B[A[i]]=true if was false, else bing!

like image 43
qPCR4vir Avatar answered Oct 06 '22 01:10

qPCR4vir