Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby Enumerable#first vs #take

Tags:

ruby

What's the difference between ruby Enumerable/Array first(n) and take(n)?

I vaguely recall take has something to do with lazy evaluation, but I can't figure out how to use it to do that, and can't find anything useful googling or in docs. "take" is a hard method name to google for.

first(n) and take(n) are documented pretty identically, not too helpful.

first → obj or nil
first(n) → an_array
Returns the first element, or the first n elements, of the enumerable. If the   enumerable is empty, the first form returns nil, and the second form returns an empty array.

-

take(n) → array
Returns first n elements from enum.

Telling me "take has something to do with lazy evaluation" isn't enough, I sort of rememeber that already, I need an example of how to use it for such, compared to first.

like image 227
jrochkind Avatar asked Feb 09 '15 19:02

jrochkind


1 Answers

Well, I've looked at the source (Ruby 2.1.5). Under the hood, if first is provided an argument, it forwards it to take. Otherwise, it returns a single value:

static VALUE
enum_first(int argc, VALUE *argv, VALUE obj)
{
    NODE *memo;
    rb_check_arity(argc, 0, 1);
    if (argc > 0) {
    return enum_take(obj, argv[0]);
    }
    else {
    memo = NEW_MEMO(Qnil, 0, 0);
    rb_block_call(obj, id_each, 0, 0, first_i, (VALUE)memo);
    return memo->u1.value;
    }
}

take, on the other hand, requires an argument and always returns an array of given size or smaller with the elements taken from the beginning.

static VALUE
enum_take(VALUE obj, VALUE n)
{
    NODE *memo;
    VALUE result;
    long len = NUM2LONG(n);

    if (len < 0) {
    rb_raise(rb_eArgError, "attempt to take negative size");
    }

    if (len == 0) return rb_ary_new2(0);
    result = rb_ary_new2(len);
    memo = NEW_MEMO(result, 0, len);
    rb_block_call(obj, id_each, 0, 0, take_i, (VALUE)memo);
    return result;
}

So yes, that's a reason why these two are so similar. The only difference seems to be, that first can be called without arguments and will output not an array, but a single value. <...>.first(1), on the other hand, is equivalent to <...>.take(1). As simple as that.

With lazy collections, however, things are different. first in lazy collections is still enum_first which is, as seen above, is a shortcut to enum_take. take, however, is C-coded lazy_take:

static VALUE
lazy_take(VALUE obj, VALUE n)
{
    long len = NUM2LONG(n);
    VALUE lazy;

    if (len < 0) {
    rb_raise(rb_eArgError, "attempt to take negative size");
    }
    if (len == 0) {
    VALUE len = INT2FIX(0);
    lazy = lazy_to_enum_i(obj, sym_cycle, 1, &len, 0);
    }
    else {
    lazy = rb_block_call(rb_cLazy, id_new, 1, &obj,
                     lazy_take_func, n);
    }
    return lazy_set_method(lazy, rb_ary_new3(1, n), lazy_take_size);
}

...that doesn't evaulate immediately, requires a .force call for that.

And in fact, it's hinted in the docs under lazy, it lists all the lazily implemented methods. The list does contain take, but doesn't contain first. That said, on lazy sequences take stays lazy and first doesn't.

Here's an example how these work differently:

lz = (1..Float::INFINITY).lazy.map{|i| i }
# An infinite sequence, evaluating it head-on won't do
# Ruby 2.2 also offers `.map(&:itself)`

lz.take(5)                                                                                                                       
#=> #<Enumerator::Lazy: ...>
# Well, `take` is lazy then
# Still, we need values

lz.take(5).force
#=> [1, 2, 3, 4, 5]
# Why yes, values, finally

lz.first(5)
#=> [1, 2, 3, 4, 5]
# So `first` is not lazy, it evaluates values immediately

Some extra fun can be gained by running in versions prior to 2.2 and using code for 2.2 (<...>.lazy.map(&:itself)), because that way the moment you lose laziness will immediately raise a NoMethodError.

like image 94
D-side Avatar answered Sep 26 '22 12:09

D-side