Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find a duplicate entry in constant space and O(n) time

Tags:

c++

algorithm

Given an array of N integer such that only one integer is repeated. Find the repeated integer in O(n) time and constant space. There is no range for the value of integers or the value of N

For example given an array of 6 integers as 23 45 67 87 23 47. The answer is 23 (I hope this covers ambiguous and vague part)

I searched on the net but was unable to find any such question in which range of integers was not fixed. Also here is an example that answers a similar question to mine but here he created a hash table with the highest integer value in C++.But the cpp does not allow such to create an array with 2^64 element(on a 64-bit computer).

I am sorry I didn't mention it before the array is immutable

like image 741
Anubhav Agarwal Avatar asked Nov 24 '11 16:11

Anubhav Agarwal


People also ask

How do you find duplicates in array O space?

Naive approach: Use 2 loops. Check each element in the array with all other elements in the array and check if it has the same value. Sorting approach: Sort the array, this will bring all the duplicates together if present. Now navigate the array and check the adjacent elements to check for duplicates.

Is there any other possible algorithm that can help you find the duplicate student ids in the given data?

The first solution is the brute force algorithm, which is demonstrated by finding duplicate elements on integer array, but you can use the logic to find a duplicate on any kind of array.


2 Answers

Jun Tarui has shown that any duplicate finder using O(log n) space requires at least Ω(log n / log log n) passes, which exceeds linear time. I.e. your question is provably unsolvable even if you allow logarithmic space.

There is an interesting algorithm by Gopalan and Radhakrishnan that finds duplicates in one pass over the input and O((log n)^3) space, which sounds like your best bet a priori.

Radix sort has time complexity O(kn) where k > log_2 n often gets viewed as a constant, albeit a large one. You cannot implement a radix sort in constant space obviously, but you could perhaps reuse your input data's space.

There are numerical tricks if you assume features about the numbers themselves. If almost all numbers between 1 and n are present, then simply add them up and subtract n(n+1)/2. If all the numbers are primes, you could cheat by ignoring the running time of division.

As an aside, there is a well-known lower bound of Ω(log_2(n!)) on comparison sorting, which suggests that google might help you find lower bounds on simple problems like finding duplicates as well.

like image 170
Jeff Burdges Avatar answered Sep 28 '22 06:09

Jeff Burdges


If the array isn't sorted, you can only do it in O(nlogn).

Some approaches can be found here.

like image 25
Igor Avatar answered Sep 28 '22 04:09

Igor