Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a duplicate in array of integers

This was an interview question.

I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.

After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.

The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.

EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.

like image 840
Rose M Avatar asked Feb 17 '18 12:02

Rose M


3 Answers

  1. Take the very last element (x).

  2. Save the element at position x (y).

  3. If x == y you found a duplicate.

  4. Overwrite position x with x.

  5. Assign x = y and continue with step 2.

You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.

Edit: without modifying the input array

This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):

Example:

2 3 4 1 5 4 6 7 8

Entry: 8 7 6

Permutation cycle: 4 1 2 3

As we can see the duplicate (4) is the first number of the cycle.

  1. Finding the permutation cycle

    1. x = last element
    2. x = element at position x
    3. repeat step 2. n times (in total), this guarantees that we entered the cycle
  2. Measuring the cycle length

    1. a = last x from above, b = last x from above, counter c = 0
    2. a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
    3. if a == b the cycle length is c, otherwise continue with step 2.
  3. Finding the entry point to the cycle

    1. x = last element
    2. x = element at position x
    3. repeat step 2. c times (in total)
    4. y = last element
    5. if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
    6. x = element at position x, y = element at position y
    7. repeat steps 5. and 6. until a solution was found.

The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).

Example from above:

  1. x takes the following values: 8 7 6 4 1 2 3 4 1 2

  2. a takes the following values: 2 3 4 1 2
    b takes the following values: 2 4 2 4 2
    therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)

  3. x takes the following values: 8 7 6 4 | 1 2 3 4
    y takes the following values: | 8 7 6 4
    x == y == 4 in the end and this is a solution!

Example 2 as requested in the comments: 3 1 4 6 1 2 5

  1. Entering cycle: 5 1 3 4 6 2 1 3

  2. Measuring cycle length:
    a: 3 4 6 2 1 3
    b: 3 6 1 4 2 3
    c = 5

  3. Finding the entry point:
    x: 5 1 3 4 6 | 2 1
    y: | 5 1
    x == y == 1 is a solution

like image 52
maraca Avatar answered Nov 12 '22 02:11

maraca


Here is a possible implementation:

function checkDuplicate(arr) {
  console.log(arr.join(", "));
  let  len = arr.length
      ,pos = 0
      ,done = 0
      ,cur = arr[0]
      ;
  while (done < len) {
    if (pos === cur) {
      cur = arr[++pos];
    } else {
      pos = cur;
      if (arr[pos] === cur) {
        console.log(`> duplicate is ${cur}`);
        return cur;
      }
      cur = arr[pos];
    }
    done++;
  }
  console.log("> no duplicate");
  return -1;
}

for (t of [
     [0, 1, 2, 3]
    ,[0, 1, 2, 1]
    ,[1, 0, 2, 3]
    ,[1, 1, 0, 2, 4]
  ]) checkDuplicate(t);

It is basically the solution proposed by @maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.

like image 35
Aurel Bílý Avatar answered Nov 12 '22 01:11

Aurel Bílý


If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:

Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".

  1. Set i to 0 and do the following loop:
    1. Increment i until item i is unflagged.
    2. Set j to i and do the following loop:
      1. Set j to vector[j].
      2. if the item at j is flagged, j is a duplicate. Terminate both loops.
      3. Flag the item at j.
      4. If j != i, continue the inner loop.
  2. Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
like image 2
rici Avatar answered Nov 12 '22 03:11

rici