Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicate in array with a memory efficient approach

Tags:

A is an array of integers.

All the values are between 0 to A.Length-1

it means 0 <= A[i] <= A.Length-1

I am supposed to find repeating elements; and if there are several repeating elements, then choose the one that has lower index for the repeated item.

for example:

a = [3, 4, 2, 5, 2, 3] 

then

result = 2 

This was an interview question. I used another array to store items and check when it is repeating. Then it gave me time-out for some test cases. The interviewer advised to only loop over the array only once, and do not create any additional data structure.

like image 407
Kiana Montazeri Avatar asked Aug 29 '18 13:08

Kiana Montazeri


People also ask

How do you find duplicate elements in an array in a single loop?

The two common approaches to solve the problem are: Using a HashSet - populate it while iterating and abort if you find a match - O(n) time on average and O(n) space. Sort and iterate, after the array is sorted, duplicates will be adjacent to each other and easy to detect.


1 Answers

No need for another data structure. You can use the input itself as a hashset.

Every time you see a value, add A.Length to the item that corresponds to that index. As values might have been already incremented, you should look at the value as A[i] mod A.length.

If you find an item that is already >= A.length.. you have a repetition. (Remember that the problem states that all items are in the interval [0, A.Length-1])

Track the lowest index that has been found as repeated.

This results in O(N) complexity (single pass) and no use of an additional data structure, i.e. Size O(1)

The key concept behind this approach is that hashsets work this way. Conceptually, this is indirectly related to the pigeonhole principle. https://en.wikipedia.org/wiki/Pigeonhole_principle

Note: During the interview it would be important to ask implementation specific questions, discuss limitations, assumptions, etc.: - What is the data type of the items in the list? - if values are in the range [0..A.length-1], are all items unsigned or can I use negative numbers if I wanted? - etc.

During the interview, I would not claim this is a perfect answer, instead, I would discuss the assumptions with the interviewer and adjust accordingly. For instance, another answer suggested using negative numbers but it is possible that the data type of items is an unsigned type, etc.

The interview is supposed to trigger a technical discussion to explore both your knowledge and creativity.

like image 52
Juan Leni Avatar answered Oct 21 '22 22:10

Juan Leni