Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximal Length of List to Shuffle with Python random.shuffle?

I have a list which I shuffle with the Python built in shuffle function (random.shuffle)

However, the Python reference states:

Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated.

Now, I wonder what this "rather small len(x)" means. 100, 1000, 10000,...

like image 260
Henrik Avatar asked Jun 17 '10 14:06

Henrik


People also ask

Can you shuffle a list in Python?

In Python, you can shuffle (= randomize) a list, string, and tuple with random. shuffle() and random. sample() .

How do I randomly shuffle in Python?

Python Random shuffle() MethodThe shuffle() method takes a sequence, like a list, and reorganize the order of the items. Note: This method changes the original list, it does not return a new list.

How do you shuffle a range in Python?

You can use range() along with list() to generate a list of integers, and then apply random. shuffle() on that list.

How can you randomize the items of a list in place in Python?

The shuffle() method randomizes the items of a list in place.


2 Answers

TL;DR: It "breaks" on lists with over 2080 elements, but don't worry too much :)

Complete answer:

First of all, notice that "shuffling" a list can be understood (conceptually) as generating all possible permutations of the elements of the lists, and picking one of these permutations at random.

Then, you must remember that all self-contained computerised random number generators are actually "pseudo" random. That is, they are not actually random, but rely on a series of factors to try and generate a number that is hard to be guessed in advanced, or purposefully reproduced. Among these factors is usually the previous generated number. So, in practice, if you use a random generator continuously a certain number of times, you'll eventually start getting the same sequence all over again (this is the "period" that the documentation refers to).

Finally, the docstring on Lib/random.py (the random module) says that "The period [of the random number generator] is 2**19937-1."

So, given all that, if your list is such that there are 2**19937 or more permutations, some of these will never be obtained by shuffling the list. You'd (again, conceptually) generate all permutations of the list, then generate a random number x, and pick the xth permutation. Next time, you generate another random number y, and pick the yth permutation. And so on. But, since there are more permutations than you'll get random numbers (because, at most after 2**19937-1 generated numbers, you'll start getting the same ones again), you'll start picking the same permutations again.

So, you see, it's not exactly a matter of how long your list is (though that does enter into the equation). Also, 2**19937-1 is quite a long number. But, still, depending on your shuffling needs, you should bear all that in mind. On a simplistic case (and with a quick calculation), for a list without repeated elements, 2081 elements would yield 2081! permutations, which is more than 2**19937.

like image 86
rbp Avatar answered Oct 09 '22 20:10

rbp


I wrote that comment in the Python source originally, so maybe I can clarify ;-)

When the comment was introduced, Python's Wichmann-Hill generator had a much shorter period, and we couldn't even generate all the permutations of a deck of cards.

The period is astronomically larger now, and 2080 is correct for the current upper bound. The docs could be beefed up to say more about that - but they'd get awfully tedious.

There's a very simple explanation: A PRNG of period P has P possible starting states. The starting state wholly determines the permutation produced. Therefore a PRNG of period P cannot generate more than P distinct permutations (and that's an absolute upper bound - it may not be achieved). That's why comparing N! to P is the correct computation here. And, indeed:

>>> math.factorial(2080) > 2**19937 - 1 False >>> math.factorial(2081) > 2**19937 - 1 True 
like image 25
Tim Peters Avatar answered Oct 09 '22 21:10

Tim Peters