Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would I want to use a custom RNG for Array#shuffle?

Tags:

random

ruby

The documentation for Array#shuffle states:

shuffle(random: rng) → new_ary

The optional rng argument will be used as the random number generator.

a.shuffle(random: Random.new(1))  #=> [1, 3, 2]

What does that mean and why would I want to do that?

like image 977
awendt Avatar asked Mar 15 '23 10:03

awendt


2 Answers

optional rng argument will create a fixed random pattern.

Lets try shuffle without rng argument, we should get different random patterns:

a = [ 1, 2, 3 ] 
a.shuffle
# => [3, 2, 1]
a.shuffle
# => [2, 3, 1]

Now with rng:

a.shuffle(random: Random.new(1))
# => [1, 3, 2] 
a.shuffle(random: Random.new(1))
# => [1, 3, 2]

As you can see the shuffled array will always contain the same Random pattern -[1, 3, 2] in this case.

why would I want to do that?

(As mentioned in comments below)

Reproducible random is very valuable. It comes handy in tests, games etc.

like image 56
shivam Avatar answered Mar 23 '23 07:03

shivam


Internally the Array#shuffle method needs a source of random numbers. When you provide the optional RNG parameter, you are telling the method to use that object as the data source.

It is not directly for reproducibility. By default .shuffle uses the shared Kernel#rand RNG and this can be seeded using srand.

You can reproduce shuffles as follows:

srand(30)
[0,1,2,3,4,5,6].shuffle
# => [3, 1, 2, 0, 4, 6, 5]

srand(30)
[0,1,2,3,4,5,6].shuffle
# => [3, 1, 2, 0, 4, 6, 5]

If all you need is repeatability for tests, then srand will cover your needs.

So what is it for?

Shuffling an array requires a source of random numbers in order to work. By allowing you to over-ride the default Kernel#rand, the design allows you control over how these are sourced. Other functions that require a source of randomness also allow similar over-rides e.g. Array#sample.

Having this level of control allows you to build shuffled arrays arbitrarily, and separately from any other parts of your code that rely on sources of random numbers. Reproducible output is one useful outcome, with the addition of independence from other parts of a program using random numbers that may or may not need reproducible results, or may run at different times that you cannot control.

In addition, for shuffling algorithms there is a problem creating an even distribution when you have a long list. If you are shuffling N items you need factorial(N) or N! possible unique lists of numbers to come from your RNG, otherwise it cannot possibly produce all allowed arrangements. For Ruby's built in RNG, this limit occurs when shuffling around 2000 items in theory - provided the srand value has been chosen from a high quality original random source. You can do better by using an RNG that has has an even higher limit, or a "true" RNG that sources its data from a physical system.

like image 30
Neil Slater Avatar answered Mar 23 '23 07:03

Neil Slater