Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Ruby's sort_by {rand} work?

Tags:

I think this is a great Ruby one-liner:

someArray.sort_by {rand} 

It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:

  1. rand evaluates to a number between 0 and 1 (like 0.783468632804653)
  2. rand is being repeatedly evaluated in the code above, because assigning it to x first breaks the random sort
  3. sort_by {0.783468632804653}, or any other number I tried, has no effect on the array

ruby-doc.org wasn't much help to me in this case.

Can someone explain this step-by-step?

Update

I've been using Ruby longer now, and I see that I was missing a concept or two here. The key thing is that:

  1. rand is a method (defined on Kernel); it generates a random number
  2. {rand} is a block, which sort_by keeps, calling it each time it wants to compare two items in the collection. If the collection is a bunch of objects representing countries, it needs to be able to grab two of them and determine which one comes first. Do you put the one with the longest name first? The one with the largest land mass? The block should answer that question by returning a value that says "you asked about Spain vs Cameroon, and I say Cameroon comes first." (You could do that with {|country| country.name.length}

The rest of how sort_by works is explained in the documentation. I'm still not quite sure why returning a random number works at all - presumably sort_by rounds it to -1, 0, or 1, whichever is closest? But in any case, getting a different random number every time you call the block is quite different from getting the same number every time. When sort_by says "which of these two countries comes first?", {rand} puts on a blindfold, turns around 10 times, points and says "that one!" :)

like image 322
Nathan Long Avatar asked Jan 11 '10 04:01

Nathan Long


1 Answers

In Ruby 1.8/1.9 both sort and sort_by are implemented in C, this is a rough equivalent of how this works:

Say you start with [1,2,3,4] and call sort_by{rand}:

  1. (I invented some random numbers):

    An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]

    In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}

  2. Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

    [[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]] 

    In rough Ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}

  3. Pointers are copied from the new sorted array, to the correct position in the original array.

    [1,3,2,4] 

    In rough Ruby this step is: ary.map{|x,y| y}

This technique is sometimes referred to as a "Schwartzian Transform". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

Note: array.shuffle! will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates:

static VALUE rb_ary_shuffle_bang(VALUE ary) {     long i = RARRAY_LEN(ary);      rb_ary_modify(ary);     while (i) {   long j = rb_genrand_real()*i;   VALUE tmp = RARRAY_PTR(ary)[--i];   RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];   RARRAY_PTR(ary)[j] = tmp;     }     return ary; } 
like image 192
Sam Saffron Avatar answered Oct 03 '22 23:10

Sam Saffron