I want to choose <code>k</code> elements uniformly at random out of a possible <code>n</code> without choosing the same number twice. There are two trivial approaches to this. <ol> <li>Make a list of all <code>n</code> possibilities. Shuffle them (you don't need to shuffle all <code>n</code> numbers just <code>k</code> of them by performing the first <code>k</code> steps of Fisher Yates). Choose the first <code>k</code>. This approach takes <code>O(k)</code> time (assuming allocating an array of size <code>n</code> takes <code>O(1)</code> time) and <code>O(n)</code> space. This is a problem if <code>k</code> is very small relative to <code>n</code>.</li> <li>Store a set of seen elements. Choose a number at random from <code>[0, n-1]</code>. While the element is in the set then choose a new number. This approach takes <code>O(k)</code> space. The run-time is a little more complicated to analyze. If <code>k = theta(n)</code> then the run-time is <code>O(k*lg(k))=O(n*lg(n))</code> because it is the <a href="http://en.wikipedia.org/wiki/Coupon_collector%27s_problem" rel="noreferrer">coupon collector's problem</a>. If <code>k</code> is small relative to <code>n</code> then it takes slightly more than <code>O(k)</code> because of the probability (albeit low) of choosing the same number twice. This is better than the above solution in terms of space but worse in terms of run-time.</li> </ol> My question: is there an <code>O(k)</code> time, <code>O(k)</code> space algorithm for all <code>k</code> and <code>n</code>?

With an O(1) hash table, the partial Fisher-Yates method can be made to run in O(k) time and space. The trick is simply to store only the changed elements of the array in the hash table. Here's a simple example in Java: <pre class="prettyprint lang-java prettyprint-override"><code>public static int[] getRandomSelection (int k, int n, Random rng) { if (k > n) throw new IllegalArgumentException( "Cannot choose " + k + " elements out of " + n + "." ); HashMap<Integer, Integer> hash = new HashMap<Integer, Integer>(2*k); int[] output = new int[k]; for (int i = 0; i < k; i++) { int j = i + rng.nextInt(n - i); output[i] = (hash.containsKey(j) ? hash.remove(j) : j); if (j > i) hash.put(j, (hash.containsKey(i) ? hash.remove(i) : i)); } return output; } </code></pre> This code allocates a HashMap of 2×k buckets to store the modified elements (which should be enough to ensure that the hash table is never rehashed), and just runs a partial Fisher-Yates shuffle on it. Here's a quick test on Ideone; it picks two elements out of three 30,000 times, and counts the number of times each pair of elements gets chosen. For an unbiased shuffle, each ordered pair should appear approximately 5,000 (&pm;100 or so) times, except for the impossible cases where both elements would be equal.

Choosing k out of n

Tags:

I want to choose k elements uniformly at random out of a possible n without choosing the same number twice. There are two trivial approaches to this.

Make a list of all n possibilities. Shuffle them (you don't need to shuffle all n numbers just k of them by performing the first k steps of Fisher Yates). Choose the first k. This approach takes O(k) time (assuming allocating an array of size n takes O(1) time) and O(n) space. This is a problem if k is very small relative to n.
Store a set of seen elements. Choose a number at random from [0, n-1]. While the element is in the set then choose a new number. This approach takes O(k) space. The run-time is a little more complicated to analyze. If k = theta(n) then the run-time is O(k*lg(k))=O(n*lg(n)) because it is the coupon collector's problem. If k is small relative to n then it takes slightly more than O(k) because of the probability (albeit low) of choosing the same number twice. This is better than the above solution in terms of space but worse in terms of run-time.

My question:

is there an O(k) time, O(k) space algorithm for all k and n?

496

asked Apr 25 '15 17:04

Benjy Kessler

1 Answers

With an O(1) hash table, the partial Fisher-Yates method can be made to run in O(k) time and space. The trick is simply to store only the changed elements of the array in the hash table.

Here's a simple example in Java:

public static int[] getRandomSelection (int k, int n, Random rng) {
    if (k > n) throw new IllegalArgumentException(
        "Cannot choose " + k + " elements out of " + n + "."
    );

    HashMap<Integer, Integer> hash = new HashMap<Integer, Integer>(2*k);
    int[] output = new int[k];

    for (int i = 0; i < k; i++) {
        int j = i + rng.nextInt(n - i);
        output[i] = (hash.containsKey(j) ? hash.remove(j) : j);
        if (j > i) hash.put(j, (hash.containsKey(i) ? hash.remove(i) : i));
    }
    return output;
}

This code allocates a HashMap of 2×k buckets to store the modified elements (which should be enough to ensure that the hash table is never rehashed), and just runs a partial Fisher-Yates shuffle on it.

Here's a quick test on Ideone; it picks two elements out of three 30,000 times, and counts the number of times each pair of elements gets chosen. For an unbiased shuffle, each ordered pair should appear approximately 5,000 (&pm;100 or so) times, except for the impossible cases where both elements would be equal.

123

answered Sep 25 '22 04:09

Ilmari Karonen

Related questions
                            
                                Inference variable has incompatible bounds. Java 8 Compiler Regression?
                            
                                Troubleshoot slow compilation
                            
                                Android Studio Run button does nothing
                            
                                Spring Security and ABAC (Attribute Based Access Control)
                            
                                "position" property required for ItemList with Product list items?
                            
                                Error - could not find al.exe using sdkToolsPath [duplicate]
                            
                                React Native does not support development on Windows (yet)?
                            
                                How To Open Web Page Within My App?
                            
                                Intellij idea 32 bit vs 64 bit speed and performance
                            
                                Android: pin TabLayout to top of Scrollview
                            
                                Export python scikit learn models into pmml
                            
                                How to get geom_smooth() ignore my colour grouping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With