Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performant way to select N random distinct ints in java?

Tags:

java

random

I currently am looking for the best way so select x unique ints among a range of n ints. It would be like doing Random.nextInt(range) multiple time except it should never select twice the same int. If it happens that x > n then the result would only contain n ints

I tried to do this myself and I currently have done this based on the Fisher/Yates shuffle:

private static final Random R   = new Random();

public static int[] distinctRandoms(int nb, int max) {
    int[] all = new int[max];
    for (int i = 0; i < all.length; i++) {
        all[i] = i;
    }
    if (max <= nb) {
        return all;
    }
    int index;
    int[] result = new int[nb];

    for (int j = 0, k = all.length - 1; k > 0 && j < nb; k--, j++) {
        index = R.nextInt(k + 1);
        result[j] = all[index]; // save element
        all[index] = all[k]; // overwrite chosen with last element
    }
    return result;
}

It works and performance seems good but I can't help thinking there must still be some more performant way to do so and that i'm reinventing the wheel. I thought about doing things differently if nb > (max / 2) (remove elements rather than select elements) but as you can't truncate an array in java you still end up copying all the elements you need. This method costs alot if nb = max-1

Is there any built in way to randomly select distinct ints efficiently in java ?

Edit 1:

What I mean by performant is time-efficient. I want it to be fast. I'll mostly work with small sets of randoms.

Edit 2:

I tried using shuffle like that but it's much more expensive in terms of time because of all the extra object creation.

public static Integer[] distinctRandoms2(int nb, int max) {
    ArrayList<Integer> all = new ArrayList<Integer>(max);
    for (int i = 0; i < max; i++) {
        all.add(i);
    }
    if (max <= nb) {
        return all.toArray(new Integer[max]);
    }
    Collections.shuffle(all);
    return all.subList(0, nb).toArray(new Integer[nb]);
}
like image 244
Crystark Avatar asked Jan 26 '15 14:01

Crystark


2 Answers

You can use Floyd's algorithm. It is much more efficient than shuffling if the number of elements to be selected is smaller than their range.

private static final Random random = new Random();

/**
 * Converts a set of Integer to an array of int.
 */
private static int[] setToArray(Set<Integer> aSet) {
    int[] result = new int[aSet.size()];
    int index = 0;
    for (int number : aSet) {
        result[index] = number;
        index++;
    }
    return result;
}

/**
 * Generates an array of min(count, maxValue) distinct random ints 
 * from [0, maxValue - 1] range.
 * @param count The number of elements to be generated.
 * @param maxValue The upper bound of the range(exclusively).
 */
public static int[] getDistinctRandomNumbers(int count, int maxValue) {
    Set<Integer> was = new HashSet<>();
    for (int i = Math.max(0, maxValue - count); i < maxValue; i++) {
        int curr = i == 0 ? 0 : random.nextInt(i);
        if (was.contains(curr))
            curr = i;
        was.add(curr);
    }
    return setToArray(was);
}

It has O(count) time and space complexity, where count is number of distinct integers that should be generated.

like image 113
kraskevich Avatar answered Oct 25 '22 04:10

kraskevich


You can use shuffle method from java.util.Collections class.

Just create list of Integers from 0 to x-1, then call shuffle method on it and take first nb elements.

Using shuffle method has sense when nb is close to max. So it would be good for following pairs of parameters:

  • nb=70, max=100
  • nb=900, max=1000
  • nb=9000, max=10000

but not so good for:

  • nb=10, max=10^8
  • nb=100, max=10^9

It would be a good idea to combine above method (using shuffle) with Floyd's algorithm from other answer. Selection of algorithm should be based on ratio nb/max. Border ratio should be chosen carefully.

like image 27
rtruszk Avatar answered Oct 25 '22 04:10

rtruszk