I think this is a great Ruby one-liner:
someArray.sort_by {rand}
It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:
rand
evaluates to a number between 0 and 1 (like 0.783468632804653)rand
is being repeatedly evaluated in the code above, because assigning it to x
first breaks the random sortsort_by {0.783468632804653}
, or any other number I tried, has no effect on the arrayruby-doc.org wasn't much help to me in this case.
Can someone explain this step-by-step?
I've been using Ruby longer now, and I see that I was missing a concept or two here. The key thing is that:
rand
is a method (defined on Kernel); it generates a random number{rand}
is a block, which sort_by
keeps, calling it each time it wants to compare two items in the collection. If the collection is a bunch of objects representing countries, it needs to be able to grab two of them and determine which one comes first. Do you put the one with the longest name first? The one with the largest land mass? The block should answer that question by returning a value that says "you asked about Spain vs Cameroon, and I say Cameroon comes first." (You could do that with {|country| country.name.length}
The rest of how sort_by
works is explained in the documentation. I'm still not quite sure why returning a random number works at all - presumably sort_by
rounds it to -1, 0, or 1, whichever is closest? But in any case, getting a different random number every time you call the block is quite different from getting the same number every time. When sort_by
says "which of these two countries comes first?", {rand}
puts on a blindfold, turns around 10 times, points and says "that one!" :)
In Ruby 1.8/1.9 both sort
and sort_by
are implemented in C, this is a rough equivalent of how this works:
Say you start with [1,2,3,4]
and call sort_by{rand}
:
(I invented some random numbers):
An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]
In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}
Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)
[[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]
In rough Ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}
Pointers are copied from the new sorted array, to the correct position in the original array.
[1,3,2,4]
In rough Ruby this step is: ary.map{|x,y| y}
This technique is sometimes referred to as a "Schwartzian Transform". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.
Note: array.shuffle!
will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates:
static VALUE rb_ary_shuffle_bang(VALUE ary) { long i = RARRAY_LEN(ary); rb_ary_modify(ary); while (i) { long j = rb_genrand_real()*i; VALUE tmp = RARRAY_PTR(ary)[--i]; RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j]; RARRAY_PTR(ary)[j] = tmp; } return ary; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With