Algo to find duplicates in a very large array

Question

During one of technical interview got this question. I know way to solve this problem using (in java) HashSet.

But i could not understand when interviewer forced on the word "a very large array let's say 10 million elements in the given array".

Do i need to change the approach? If not, what should be the efficient to achieve this?

PS: Algo or implementation is language agnostic.

Thank you.

YoungHobbit · Accepted Answer

There were some key things, which interviewer expected you to ask back like: if you can not load the array in memory, then how much I can load. These are the steps to solve the problem:

you need to divide the array in how much memory is available to you.
Let's say you can load 1M number at a time. You have split the data in k parts. You load the first 1M and build Min Heap of it. Then remove the top and apply the Heapify on Min Heap.
Repeat the same for other parts of the data.
Now you will have K sorted splits.
Now fetch a first number from each K split and again build a Min Heap.
Now remove the top from the Min Heap and store the value in temporary variable as well for comparing with the next coming number for finding the duplicates.
Now fetch the next number from the same split (part) whose number got removed last time. Put that number on top of Min Heap and apply Heapify.
Now the top of the Min Heap is your next sorted number and compare it with the temporary variable for finding the duplicates. Update thetemporary variable` if number is not duplicate.

Reblochon Masque · Answer

You can do it in O(nlog(n)):

Sort the array
find the duplicates (they will be next to each other) in one pass.

I think that is what the interviewer wanted to hear.

if you did a merge sort or a quick sort, finding the duplicates could be done at merging in hidden time. These can be implemented "in-place", or "by-part" if the array is too large to fit in memory.

Algo to find duplicates in a very large array

Tags:

java

python

javascript

c

algorithm

Sam

2 Answers

YoungHobbit

Reblochon Masque

Recent Activity

Donate For Us

Algo to find duplicates in a very large array

Tags:

java

python

javascript

c

algorithm

Sam

2 Answers

YoungHobbit

Reblochon Masque

Related questions

Recent Activity

Donate For Us