Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Array#sample guarantee random order?

Tags:

random

ruby

Does it only guarantee a random subset, or also random order?

Use case is generating a secret string with ('a'..'z').to_a.sample(8).join. I'd like to know whether I can trust that all 26⋅25⋅24⋅23⋅22⋅21⋅20⋅19 possible outcomes are equally likely.

The documentation says:

Choose [...] n random elements from the array.

The elements are chosen by using random and unique indices into the array in order to ensure that an element doesn't repeat itself unless the array already contained duplicate elements.

I see three possible interpretations. For example for [1, 2, 3].sample(2):

  • Return [1, 2], [1, 3], [2, 1], [2, 3], [3, 1] or [3, 2], each with probability 1/6. This is what you get when you pick a random element as the first output element and then another random element (from the remaining ones) as the second output element.
  • Return [1, 2], [1, 3] or [2, 3], each with probability 1/3. This is what you get by choosing a subset of the indices and then going through the array, collecting the elements if their index is among the chosen ones.
  • Something weird in between those two. For example return [1, 2] or [1, 3], each with probability 1/3, or [2, 3] or [3, 2], each with probability 1/6.

I tested it and the first interpretation is what happened. And looking at the source code, I also got the impression that that's what it does in general. But I'm worried that I'm overlooking something, or that that's just a side effect of the current implementation, and that it could change in the future (or already is different in some Ruby implementations). And I can imagine the second interpretation/implementation being beneficial, either because one wants such "stable" sampling, or for efficiency reasons.

Is my first interpretation what it's supposed to do? Can I rely on the result not only being a random subset but also having a random order? And shouldn't the documentation be clearer about this?


Here's my test code with statistics, in case you want to try it yourself:

array = (1..3).to_a
n = 2

count = Hash.new(0)
(10**6).times do
  count[array.sample(n)] += 1
end

puts "#{count.size} different samples occurred."
puts "Smallest was #{count.keys.min}, largest was #{count.keys.max}."
puts "Frequencies ranged from #{count.values.min} to #{count.values.max}."

Outputs for example:

6 different samples occurred.
Smallest was [1, 2], largest was [3, 2].
Frequencies ranged from 165698 to 167234.

Edit: I created a Ruby issue.

like image 907
Stefan Pochmann Avatar asked Oct 21 '25 00:10

Stefan Pochmann


1 Answers

I expect the method's name derives from the way samples are drawn without replacement in statistics. In that context the elements of the population being sampled are not necessary ordered, and if they are ordered that has no bearing on how the sampling is done.

The usual way of explaining sampling without replacement is drawing a certain number of balls at random from a container, with each ball drawn being put aside before the next ball is drawn. The balls may be identified by color or number or in some other way, but the result of the sampling does not reflect any concept of ordering.

Keep in mind that the method sample is defined on the class Array, but the elements of arrays are not necessarily ordered. For example,

[1, "1", :one, 1..2, { a: 1 }].sample(2) # => [{:a=>1}, :one]

Obviously, the elements of this sample cannot be ordered.

It is conceivable that sample could have been constructed so that it would order the elements of the sample if they were comparable, but I cannot think of another Ruby method that behaves that way. Moreover, implementing that would be problematic. It would be easy to determine that the elements of [1,2,3,4] are comparable (using Integer#<=>), but it would not always be so simple. Suppose, for example, the array were

[1, 2.5, 3/2r, BigDecimal.new("2.1")]

These elements are in fact comparable ([1, (3/2), 0.21e1, 2.5] sorted), but Ruby would have to do some work to make that determination. I suppose Ruby could attempt to sort the sample and rescue the exception if the elements were not comparable, but that seems quite a stretch.

like image 151
Cary Swoveland Avatar answered Oct 25 '25 06:10

Cary Swoveland



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!