Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the first duplicate in an int array, java

Tags:

Here is a common interview question i came across, however i failed to improve it in the way it demands.

assume we have an int array int[] A, we want to find the first duplicate entry.  
  1. almost everyone can think of using a HashSet, and add to it while parsing.this will lead to O(n) time and O(n) space. After this I was asked to solve it without other data structures. I said the dumbest idea will be comparing each one in O(n^2) time. And then I was asked to improve the O(n^2) time.

  2. And to improve it, i thought of using a fixed size array(assuming the maximum number is n), boolean[] b = new boolean[n]; however I was not allowed to use this method.

  3. Then I thought of using an int variable, using bit manipulation, if the maximum number is smaller than 32, then for n we can push 1 to n bits left and | to a checker, then & the checker to the next entry in the array to check if it's > 0. e.g.:

    int c = A[i]; if(check & (1 << c) > 0) return false; check |= 1 << c; 

however this is not allowed either.

So there was an hint that I can use the array itself as hashset/hashtable, and "linear hashing"?

any help ? thanks

like image 587
Ruobo Wang Avatar asked May 26 '12 15:05

Ruobo Wang


People also ask

How do you find duplicate numbers in an integer array?

One of the most common ways to find duplicates is by using the brute force method, which compares each element of the array to every other element. This solution has the time complexity of O(n^2) and only exists for academic purposes.

How do you check if there are duplicates in an array Java?

One more way to detect duplication in the java array is adding every element of the array into HashSet which is a Set implementation. Since the add(Object obj) method of Set returns false if Set already contains an element to be added, it can be used to find out if the array contains duplicates in Java or not.


2 Answers

Linear hashing as defined by Wikipedia has the advantage that resizing occurs incrementally, because buckets are split one-by-one in round-robin fashion, retaining constant amortized time complexity for insertion with resize. Their idea therefore is to iterate over the array, reusing the elements already iterated over as storage for linear hashing.

While I am far from an expert on linear hashing, I don't see any way to fit the hash table in the array. Granted, to store n elements with linear hashing, you might get by with using n buckets. However, the number of elements in a bucket being unbounded, you'd need something like a linked list to implement each bucket, which costs an additional O(n) memory for pointers.

As such, this algorithm does not yield a better asymptotic space complexity than an ordinary HashSet. It does reduce memory consumption by a constant factor, though.

Its time complexity is on par with the ordinary HashSet.

Edit: It appears to me this answer is being ignored (no votes, no comments). Is it not useful? Please comment so I know what to improve.

like image 128
meriton Avatar answered Dec 25 '22 07:12

meriton


I have this one idea: as you progress down the array, you sort the part that you have visited. By employing binary search you'll improve time; space is 0. The sort itself is... insertion sort? You're basically running the sort as normal, but as you search for the place to insert the new numeber, if you hit the number itself, you shout "bingo". That's an improvement over zero space + O(n2) time.

like image 45
Marko Topolnik Avatar answered Dec 25 '22 07:12

Marko Topolnik