Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy Shuffle Algorithms

I have a large list of elements that I want to iterate in random order. However, I cannot modify the list and I don't want to create a copy of it either, because 1) it is large and 2) it can be expected that the iteration is cancelled early.

List<T> data = ...;
Iterator<T> shuffled = shuffle(data);
while (shuffled.hasNext()) {
    T t = shuffled.next();
    if (System.console().readLine("Do you want %s?", t).startsWith("y")) {
        return t;
    }
}
System.out.println("That's all");
return t;

I am looking for an algorithm were the code above would run in O(n) (and preferably require only O(log n)space), so caching the elements that were produced earlier is not an option. I don't care if the algorithm is biased (as long as it's not obvious).

(I uses pseudo-Java in my question, but you can use other languages if you wish)

Here is the best I got so far.

Iterator<T> shuffle(final List<T> data) {
    int p = data.size();
    while ((data.size() % p) == 0) p = randomPrime();
    return new Iterator<T>() {
        final int prime = p;
        int n = 0, i = 0;
        public boolean hasNext() { return i < data.size(); }
        public T next() {
            i++; n += prime;
            return data.get(n);
        }
    }
}

Iterating all elements in O(n), constant space, but obviously biased as it can produce only data.size() permutations.

like image 328
Cephalopod Avatar asked Apr 23 '13 08:04

Cephalopod


4 Answers

The easiest shuffling approaches I know of work with indices. If the List is not an ArrayList, you may end up with a very inefficient algorithm if you try to use one of the below (a LinkedList does have a get by ID, but it's O(n), so you'll end up with O(n^2) time).

If O(n) space is fine, which I'm assuming it's not, I'd recommend the Fisher-Yates / Knuth shuffle, it's O(n) time and is easy to implement. You can optimise it so you only need to perform a single operation before being able to get the first element, but you'll need to keep track of the rest of the modified list as you go.

My solution:

Ok, so this is not very random at all, but I can't see a better way if you want less than O(n) space.

It takes O(1) space and O(n) time.

There may be a way to push it up the space usage a little and get more random results, but I haven't figured that out yet.

It has to do with relative primes. The idea is that, given 2 relative primes a (the generator) and b, when you loop through a % b, 2a % b, 3a % b, 4a % b, ..., you will see every integer 0, 1, 2, ..., b-2, b-1, and this will also happen before seeing any integer twice. Unfortunately I don't have a link to a proof (the wikipedia link may mention or imply it, I didn't check in too much detail).

I start off by increasing the length until we get a prime, since this implies that any other number will be a relative prime, which is a whole lot less limiting (and just skip any number greater than the original length), then generate a random number, and use this as the generator.

I'm iterating through and printing out all the values, but it should be easy enough to modify to generate the next one given the current one.

Note I'm skipping 1 and len-1 with my nextInt, since these will produce 1,2,3,... and ...,3,2,1 respectively, but you can include these, but probably not if the length is below a certain threshold.

You may also want to generate a random number to multiply the generator by (mod the length) to start from.

Java code:

static Random gen = new Random();

static void printShuffle(int len)
{
  // get first prime >= len
  int newLen = len-1;
  boolean prime;
  do
  {
     newLen++;
     // prime check
     prime = true;
     for (int i = 2; prime && i < len; i++)
        prime &= (newLen % i != 0);
  }
  while (!prime);

  long val = gen.nextInt(len-3) + 2;
  long oldVal = val;
  do
  {
     if (val < len)
        System.out.println(val);
     val = (val + oldVal) % newLen;
  }
  while (oldVal != val);
}
like image 159
Bernhard Barker Avatar answered Nov 20 '22 12:11

Bernhard Barker


This is an old thread, but in case anyone comes across this in future, a paper by Andrew Kensler describes a way to do this in constant time and constant space. Essentially, you create a reversible hash function, and then use it (and not an array) to index the list. Kensler describes a method for generating the necessary function, and discusses "cycle walking" as a way to deal with a domain that is not identical to the domain of the hash function. Afnan Enayet's summary of the paper is here: https://afnan.io/posts/2019-04-05-explaining-the-hashed-permutation/.

like image 23
Graham Mainwaring Avatar answered Nov 20 '22 12:11

Graham Mainwaring


You may try using a buffer to do this. Iterate through a limited set of data and put it in a buffer. Extract random values from that buffer and send it to output (or wherever you need it). Iterate through the next set and keep overwriting this buffer. Repeat this step.

You'll end up with n + n operations, which is still O(n). Unfortunately, the result will not be actually random. It will be close to random if you choose your buffer size properly.

On a different note, check these two: Python - run through a loop in non linear fashion, random iteration in Python

Perhaps there's a more elegant algorithm to do this better. I'm not sure though. Looking forward to other replies in this thread.

like image 1
metsburg Avatar answered Nov 20 '22 13:11

metsburg


This is not a perfect answer to your question, but perhaps it's useful.

The idea is to use a reversible random number generator and the usual array-based shuffling algorithm done lazily: to get the i'th shuffled item, swap a[i] with and a randomly chosen a[j] where j is in [i..n-1], then return a[i]. This can be done in the iterator.

After you are done iterating, reset the array to original order by "unswapping" using the reverse direction of the RNG.

The unshuffling reset will never take longer than the original iteration, so it doesn't change asymptotic cost. Iteration is still linear in the number of iterations.

How to build a reversible RNG? Just use an encryption algorithm. Encrypt the previously generated pseudo-random value to go forward, and decrypt to go backward. If you have a symmetric encryption algorithm, then you can add a "salt" value at each step forward to prevent a cycle of two and subtract it for each step backward. I mention this because RC4 is simple and fast and symmetric. I've used it before for tasks like this. Encrypting 4-byte values then computing mod to get them in the desired range will be quick indeed.

You can press this into the Java iterator pattern by extending Iterator to allow resets. See below. Usage will look like:

ShuffledList<Integer> lst = new SuffledList<>();

... build the list with the usual operations

ResetableInterator<Integer> i = lst.iterator();
while (i.hasNext()) {
  int val = i.next();

  ... use the randomly selected value

  if (anyConditinoAtAll) break;
}
i.reset();  // Unshuffle the array

I know this isn't perfect, but it will be fast and give a good shuffle. Note that if you don't reset, the next iterator will still be a new random shuffle, but the original order will be lost forever. If the loop body can generate an exception, you'd want the reset in a finally block.

class ShuffledList<T> extends ArrayList<T> implements Iterable<T> {

    @Override
    public Iterator<T> iterator() {
        return null;
    }   

    public interface ResetableInterator<T> extends Iterator<T> {
        public void reset();
    }

    class ShufflingIterator<T> implements ResetableInterator<T> {

        int mark = 0;

        @Override
        public boolean hasNext() {
            return true;
        }

        @Override
        public T next() {
            return null;
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException("Not supported.");
        }

        @Override
        public void reset() {
            throw new UnsupportedOperationException("Not supported yet.");
        }
    }
}
like image 1
Gene Avatar answered Nov 20 '22 13:11

Gene