Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a predictable shuffling of a sequence without generating the whole sequence in advance?

The following python code describes exactly what I want to achieve for a sequence of arbitrary size (population):

import random
fixed_seed = 1 #generate the same sequence every time with a fixed seed
population = 1000
sample_count = 5 #demonstration number
num_retries = 3  #just enough to show the repeatable behaviour
for trynum in xrange(num_retries):
    #generate the fresh/ordered sequence (0->population)...
    seq = range(population)
    #seed the random number generator the same way every time...
    random.seed(fixed_seed)
    #shuffle the sequence...
    random.shuffle(seq)
    #display results for this try...
    sample_sequence = [str(x) for x in seq[:sample_count]]
    print "try %s: %s..." % (trynum + 1, ", ".join(sample_sequence))
#Sample output...
#try 1: 995, 721, 62, 326, 541...
#try 2: 995, 721, 62, 326, 541...
#try 3: 995, 721, 62, 326, 541...

The problem with that method is that it requires generating the entire sequence in memory first. This can be a problem for huge populations.

Note that a potentially big advantage of this method is that you can pick off any array position at any time.

Now - If the problem at hand happens to let you set the population size to a power of two (minus 1), a Linear Feedback Shift Register can be used to get the predictable random sequence. LFSRs are neat, and explained pretty well in the wikipedia article on them.

The python code below demonstrates this (and I did a pile of uniqueness testing to ensure it works as advertised). See the wikipedia article again for an explanation of how the code works (Galois configuration).

TAP_MASKS = { #only one needed, but I included 3 to make the code more useful
    10: 0x00000240, #taps at 10, 7
    16: 0x0000B400, #taps at 16, 14, 13, 11
    32: 0xE0000200, #taps at 32, 31, 30, 10
}

def MaxLengthLFSR(seed, register_length):
    "Gets next value from seed in max-length LFSR using Galois configuration."
    lsb = seed & 1
    next_val = seed >> 1
    if lsb == 1:
        mask = TAP_MASKS[register_length]
        next_val ^= mask
    return next_val

reglen = 16  #number of bits in register
population = (2**reglen) - 1 #not used, just showing it
fixed_seed = 1   #seed == startval in this case (could randomize in population)
sample_count = 5 #demonstration number
num_retries = 3  #just enough to show the repeatable behaviour
for trynum in xrange(num_retries):
    next_val = fixed_seed
    seq = [fixed_seed, ]
    for x in xrange(sample_count - 1):
        next_val = MaxLengthLFSR(next_val, reglen)
        seq.append(next_val)
    seq = [str(x) for x in seq]
    print "try %s: %s..." % (trynum + 1, ", ".join(seq))
#Sample output...
#try 1: 1, 46080, 23040, 11520, 5760...
#try 2: 1, 46080, 23040, 11520, 5760...
#try 3: 1, 46080, 23040, 11520, 5760...

This is nice because you can have a HUGE population and easily calculate a repeatable non-repeating random number sequence without using a big chunk of memory.

The drawbacks are a) that it is limited to a "shuffling" sequences of size (2**N - 1), and b) that you cannot determine what the value of a particular position in the random sequence is at an arbitrary location. You need to know the value at a particular point and walk the sequence from there.

The latter (b) is mostly ok since most of the time you'll generate the sequence in order, so you just need to remember the last value. The power of 2 limitation (a) is kind of a deal killer,though... depending on the application.

How do you achieve maximum-length-LFSR-like non-repeating results for arbitrary sequence lengths?

As a bonus, it would be nice to have a solution where you are able to know the number at a given sequence position without needing to walk through the sequence to that position.


Note: if you want a good starting set of LFSR tap locations for maximum-length LFSRs (ones that generate the whole register population without repeating once), this link is quite good and has a huge number of tap locations per register size (up to 32 bits, anyway).

Also please note that I've seen many questions closely related to my question and shuffling/LFSR, but none of them exactly relate to what I'm after (predictable shuffle of arbitrary size linear sequence). Or at least as far as I have been able to understand them, anyway.

I've recently been looking into Linear Congruential Generators, which seem promising, but I haven't been able to get them to work yet. Rather than sitting on the question further I'll ask it, and post the answer if I figure it out and they work.

like image 867
Russ Avatar asked Oct 11 '10 21:10

Russ


1 Answers

I've actually written about this before: Secure Permutations with Block Ciphers. In a nutshell:

  1. Yes, you can use an LFSR to generate permutations with a length that's a power of 2. You can also use any block cipher. With a block cipher, you can also find the element at index n, or the index for element n.
  2. To generate a permutation with arbitrary length l, create one with the smallest power of 2 length greater than l. When you want to find the nth permutation element, apply the permutation function, and if it generates a number outside the desired range, apply it again; repeat until the number is in the acceptable range.

The number of iterations required for step 2 will average no more than 2; the worst case is high, but extremely unlikely to occur.

like image 127
Nick Johnson Avatar answered Sep 24 '22 18:09

Nick Johnson