Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select N random elements from a List efficiently (without toArray and change the list)

As in the title, I want to use Knuth-Fisher-Yates shuffle algorithm to select N random elements from a List but without using List.toArray and change the list. Here is my current code:

public List<E> getNElements(List<E> list, Integer n) {
    List<E> rtn = null;

    if (list != null && n != null && n > 0) {
        int lSize = list.size();
        if (lSize > n) {
            rtn = new ArrayList<E>(n);
            E[] es = (E[]) list.toArray();
            //Knuth-Fisher-Yates shuffle algorithm 
            for (int i = es.length - 1; i > es.length - n - 1; i--) {
                int iRand = rand.nextInt(i + 1);
                E eRand = es[iRand];
                es[iRand] = es[i];
                //This is not necessary here as we do not really need the final shuffle result.
                //es[i] = eRand;
                rtn.add(eRand);
            }

        } else if (lSize == n) {
            rtn = new ArrayList<E>(n);
            rtn.addAll(list);
        } else {
            log("list.size < nSub! ", lSize, n);
        }
    }

    return rtn;
}

It uses list.toArray() to make a new array to avoid modifying the original list. However, my problem now is that my list could be very big, can have 1 million elements. Then list.toArray() is too slow. And my n could range from 1 to 1 million. When n is small (say 2), the function is very in-efficient as it still need to do list.toArray() for a list of 1 million elements.

Can someone help improve the above code to make it more efficient when dealing with large lists. Thanks.

Here I assume Knuth-Fisher-Yates shuffle is the best algorithm to do the job of selecting n random elements from a list. Am I right? I would be very glad to if there is other algorithms better than Knuth-Fisher-Yates shuffle to do the job in terms of the speed and the quality of the results (guarantee real randomness).

Update:

Here is some of my test results:

When selection n from 1000000 elements.

When n<1000000/4 the fastest way to through using Daniel Lemire's Bitmap function to select n random id first then get the elements with these ids:

public List<E> getNElementsBitSet(List<E> list, int n) {
        List<E> rtn = new ArrayList<E>(n);
        int[] ids = genNBitSet(n, 0, list.size());
        for (int i = 0; i < ids.length; i++) {
            rtn.add(list.get(ids[i]));
        }
        return rtn;
    }

The genNBitSet is using the code generateUniformBitmap from https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java

When n>1000000/4 the Reservoir Sampling method is faster.

So I have built a function to combine these two methods.

like image 783
Changwang Zhang Avatar asked May 18 '14 08:05

Changwang Zhang


People also ask

How do you randomly select elements from a list?

Select randomly n elements from a list using choice() The choice() method is used to return a random number from given sequence. The sequence can be a list or a tuple. This returns a single value from available data that considers duplicate values in the sequence(list).

How do you select a random item from a list without choice in Python?

To use Python to select random elements without replacement, we can use the random. sample() function. The function accepts two parameters: the list to sample from and the number of items to sample.

How do you select a random item from a list in Java?

Picking a Random Item/Items In order to get a random item from a List instance, you need to generate a random index number and then fetch an item by this generated index number using List. get() method. The key point here is to remember that you mustn't use an index that exceeds your List's size.

How do you select a random element from an ArrayList?

nextInt() method of Random class can be used to generate a random value between 0 and the size of ArrayList. Now use this number as an index of the ArrayList. Use get () method to return a random element from the ArrayList using number generated from nextInt() method.


1 Answers

You are probably looking for something like Resorvoir Sampling.

Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:

java like pseudo code:

E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
   //assign first k elements:
   if (i < k) { r[i++] = e; continue; }
   //add current element with decreasing probability:
   j = random(i++) + 1; //a number from 1 to i inclusive
   if (j <= k) r[j] = e;
}
return r;

This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.

like image 175
amit Avatar answered Sep 24 '22 23:09

amit