Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Duplicates in an array in O(N) time

Tags:

c++

algorithm

Is there a way to find all the duplicate elements in an array of N elements in O(N) time?

Example:

Input: 11, 29, 81, 14, 43, 43, 81, 29

Output: 29, 81, 43

Sorting the input and doing a linear scan to detect duplicates destroys the order and gives the output: 29,43,81.

Sorting-by-key another array of indices {0,1,...N-1} according to the given array to get {1,4,2} and then sorting the resultant set of indices to get {1,2,4} will give us {29,81,43}, but this takes O(N logN) time.

Is there an O(N) algorithm to solve this problem?

P.S. I forgot to add: I dont want to use hash tables. I am looking for a non-hash solution.

like image 928
alpha_cod Avatar asked Oct 01 '11 06:10

alpha_cod


People also ask

How do you find duplicates in array?

Duplicate elements can be found using two loops. The outer loop will iterate through the array from 0 to length of the array. The outer loop will select an element. The inner loop will be used to compare the selected element with the rest of the elements of the array.

What is the time complexity for counting duplicates of an integer in an sorted integer array of size n?

So the time complexity is O(n).

How do you find duplicate elements in an array using single loop?

The two common approaches to solve the problem are: Using a HashSet - populate it while iterating and abort if you find a match - O(n) time on average and O(n) space. Sort and iterate, after the array is sorted, duplicates will be adjacent to each other and easy to detect.


1 Answers

I believe a good solution (decent memory usage, can be used to immediately determine if an entry has already been seen thus preserving order, and with a linear complexity) is a trie.

If you insert the elements into the trie as if they were a string with each digit (starting from the MSD) in each node, you can pull this off with a complexity of O(m N) where m is the average length of numbers in base-10 digits.

You'd just loop over all your entries and insert them into the trie. Each time an element already exists, you skip it and move on to the next. Duplicates in this (unlike in my previous answer of a Radix Sort) will be found immediately instead of in the last iteration or what not.

I'm not sure if you would benefit from using a suffix tree here, as the "base" of the characters being entered into the trie is only 10 (compared to the base-128 for ANSI strings), but it's possible.

like image 175
Mahmoud Al-Qudsi Avatar answered Oct 16 '22 12:10

Mahmoud Al-Qudsi