Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given an array with multiple repeated entries, fnd one repeated entry O(N) time and constant space

We have been given an array of size N that contains integers in the range 0 to N-2, both inclusive.

The array can have multiple repeated entries. We need to find one of the duplicated entries in O(N) time and constant space.

I was thinking of taking the product and sum of all the entires in the array, and the product and sum of all the numbers in the range 0 to N-2.

Then, the difference of the sums and the division of the products would give us two equations. This approach would work if it were given that there are only two repeated entries, but since there can be more than two, I think my approach fails.

Any suggestions?

Edit: The array is immutable. I realize that this is an important piece of information and I apologize that I forgot to include this earlier.

like image 523
efficiencyIsBliss Avatar asked Nov 20 '10 17:11

efficiencyIsBliss


4 Answers

Here's a nice treatment. It passes through some easier problems before addressing this one.

http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

It contains a solution for when you can modify the input array, and another for when you can't.

Brief summary in case the link ever dies: the array indexes run from 0 .. N-1, and the array values run 0 .. N-2. Each array element can therefore be considered as an index (or "pointer") into the array itself: element i "points to" element ra[i], ra[i] points to ra[ra[i]] and so on. By repeatedly following these pointers, me must eventually enter a cycle, because we certainly can't go forever without revisiting some node or other.

Now, the very last element, N-1, isn't pointed to by any other element. So if we start there and eventually enter a cycle, somewhere along the way there must be an element which can be reached from two different places: the route we took the first time, and the route which is part of the cycle. Something like this:

  N-1 -> a1 -> a2 -> a3
               ^       \
              /         v
            a6 <- a5 <- a4

In this case, a2 is reachable from two different places.

But a node which is reachable from two different places is precisely what we're looking for, a duplicate in the array (two different array elements containing the same value).

The question then is how to identify a2, and the answer is to use Floyd's cycle-finding algorithm. In particular it tells us the "start" of the loop in O(N) time and O(1) space.

like image 155
Steve Jessop Avatar answered Nov 19 '22 18:11

Steve Jessop


Assuming we are allowed to change array in place swap each element as you go through the array with the element on that "position" (e.g. if current element is curr then swap it with a[curr]) but if a[curr] already has curr then you know curr is a duplicate.

a = array...
for i = 0; i < length(a); i++
  curr = a[i]
  if a[curr] == curr:
    return duplicate curr
  swap(a[i], a[curr])
  # Now a[curr] == curr and so if it happens again we know it is a duplicate. 

This would be O(n) and constant space.

like image 33
Ajay Avatar answered Nov 19 '22 18:11

Ajay


Scan the array and add each element to a set. If the item already exists in the set - you have a dupe.

like image 1
Fortyrunner Avatar answered Nov 19 '22 18:11

Fortyrunner


Initialize a bit array of size N-2 with all entries to 0. Each index will represent all of your items in the range 0 to N-2.

Loop through your array and add items to your bitarray by setting bitarray[number] == 1. If element already contains a 1, then you've already added your element, return immediately.

If you get to the end of your array without finding a duplicate, return -1.

like image 1
Juliet Avatar answered Nov 19 '22 18:11

Juliet