Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace duplicate numbers with unique numbers from 0-(N-1)

Tags:

java

algorithm

Background:

I have an N-length array of positive random numbers that are certain to contain duplicates. e.g. 10,4,5,7,10,9,10,9,8,10,5
Edit: N is likely to be 32, or some other power of two about that size.

The Problem:

I am trying to find the fastest way to replace the duplicates with the missing numbers from 0-(N-1). Using the above example, I want a result that looks like this:
10,4,5,7,0,9,1,2,8,3,6
The goal being to have one of each number from 0 to N-1, without just replacing all the numbers with 0-(N-1) (the random order is important).
Edit: It's also important that this replacement is deterministic, i.e. the same input will have the same output (not random).

My solution:

Currently implemented in Java, uses 2 boolean arrays to keep track of used/unused numbers (unique numbers/missing numbers in the range [0,N) ), and has an approximate worst-case runtime of N+N*sqrt(N).
The code follows:

public byte[] uniqueify(byte[] input)
{
    boolean[] usedNumbers = new boolean[N];
    boolean[] unusedIndices = new boolean[N];
    byte[] result = new byte[N];

    for(int i = 0; i < N; i++) // first pass through
    {
        int newIdx = (input[i] + 128) % N; // first make positive
        if(!usedNumbers[newIdx]) // if this number has not been used
        {
            usedNumbers[newIdx] = true; // mark as used
            result[i] = newIdx; // save it in the result
        }
        else // if the number is used
        {
            unusedIndices[i] = true; // add it to the list of duplicates
        }
    }

    // handle all the duplicates
    for(int idx = 0; idx < N; idx++) // iterate through all numbers
    {
        if(unusedIndices[idx]) // if unused
            for(int i = 0; i < N; i++) // go through all numbers again
            {
                if(!usedNumbers[i]) // if this number is still unused
                {
                    usedNumbers[i] = true; // mark as used
                    result[i] = idx;
                    break;
                }
            }
    }
    return result;
}  

This seems like the fastest I can hope for, but I thought I'd ask the internet, because there are people much more clever than I who may have a better solution.

N.B. Suggestions/solutions do not have to be in Java.

Thank you.

Edit: I forgot to mention that I am converting this to C++. I posted my java implementation because it's more complete.

like image 774
anro Avatar asked Apr 06 '12 08:04

anro


1 Answers

Use a balanced binary search tree to keep track of used/unused numbers instead of a boolean array. Then you're running time will be n log n.

The most straightforward solution would be this:

  1. Go through the list and build the "unused" BST
  2. Go through the list a second time, keeping track of numbers seen so far in a "used" BST
  3. If a duplicate is found, replace it with a random element of the "unused" BST.
like image 174
tskuzzy Avatar answered Sep 30 '22 20:09

tskuzzy