In Ruby - Compare two Enumerators elegantly, it was said
The problem with zip is that it creates arrays internally, no matter what Enumerable you pass. There's another problem with length of input params
I had a look at the implementation of Enumerable#zip in YARV, and saw
static VALUE
enum_zip(int argc, VALUE *argv, VALUE obj)
{
int i;
ID conv;
NODE *memo;
VALUE result = Qnil;
VALUE args = rb_ary_new4(argc, argv);
int allary = TRUE;
argv = RARRAY_PTR(args);
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
if (!rb_block_given_p()) {
result = rb_ary_new();
}
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
return result;
}
Am I understanding the following bits correctly?
Check whether all of the arguments are arrays, and if so, replace some indirect reference to the array with a direct reference
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
If they aren't all arrays, create an enumerator instead
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
Create an array of arrays only if a block isn't given
if (!rb_block_given_p()) {
result = rb_ary_new();
}
If everything is an array, use zip_ary
, otherwise use zip_i
, and call a block on each set of values
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
Return an array of arrays if no block is given, else return nil (Qnil
)?
return result;
}
I'll be using 1.9.2-p0 as that's what I have on hand.
The rb_check_array_type
function looks like this:
VALUE
rb_check_array_type(VALUE ary)
{
return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary");
}
And rb_check_convert_type
looks like this:
VALUE
rb_check_convert_type(VALUE val, int type, const char *tname, const char *method)
{
VALUE v;
/* always convert T_DATA */
if (TYPE(val) == type && type != T_DATA) return val;
v = convert_type(val, tname, method, FALSE);
if (NIL_P(v)) return Qnil;
if (TYPE(v) != type) {
const char *cname = rb_obj_classname(val);
rb_raise(rb_eTypeError, "can't convert %s to %s (%s#%s gives %s)",
cname, tname, cname, method, rb_obj_classname(v));
}
return v;
}
Note the convert_type
call. This looks a lot like C version of Array.try_convert
and try_convert
just happens to look like this:
/*
* call-seq:
* Array.try_convert(obj) -> array or nil
*
* Try to convert <i>obj</i> into an array, using +to_ary+ method.
* Returns converted array or +nil+ if <i>obj</i> cannot be converted
* for any reason. This method can be used to check if an argument is an
* array.
*
* Array.try_convert([1]) #=> [1]
* Array.try_convert("1") #=> nil
*
* if tmp = Array.try_convert(arg)
* # the argument is an array
* elsif tmp = String.try_convert(arg)
* # the argument is a string
* end
*
*/
static VALUE
rb_ary_s_try_convert(VALUE dummy, VALUE ary)
{
return rb_check_array_type(ary);
}
So, yes, the first loop is looking for anything in argv
that is not an array and setting the allary
flag if it finds such a thing.
In enum.c
, we see this:
id_each = rb_intern("each");
So id_each
is an internal reference for the Ruby each
iterator method. And in vm_eval.c
, we have this:
/*!
* Calls a method
* \param recv receiver of the method
* \param mid an ID that represents the name of the method
* \param n the number of arguments
* \param ... arbitrary number of method arguments
*
* \pre each of arguments after \a n must be a VALUE.
*/
VALUE
rb_funcall(VALUE recv, ID mid, int n, ...)
So this:
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
Is calling to_enum
(with, essentially, the default argument) on whatever is in argv[i]
.
So, the end result of the first for
and if
blocks is that argv
is either full of arrays or full of enumerators rather than possibly being a mix of the two. But note how the logic works: if something is found that isn't an array, then everything becomes an enumerator. The first part of the enum_zip
function will wrap arrays in enumerators (which is essentially free or at least cheap enough not to worry about) but won't expand enumerators into arrays (which could be quite expensive). Earlier versions might have gone the other way (prefer arrays over enumerators), I'll leave that as an exercise for the reader or historians.
The next part:
if (!rb_block_given_p()) {
result = rb_ary_new();
}
Creates a new empty array and leaves it in result
if zip
is being called without a block. And here we should note what zip
returns:
enum.zip(arg, ...) → an_array_of_array
enum.zip(arg, ...) {|arr| block } → nil
If there is a block, then there is nothing to return and result
can stay as Qnil
; if there isn't a block, then we need an array in result
so that an array can be returned.
From parse.c
, we see that NODE_DOT2
is a double-dot range but it looks like they're just using the new node as a simple three element struct; rb_new_node
just allocates an object, sets some bits, and assigns three values in a struct:
NODE*
rb_node_newnode(enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
NODE *n = (NODE*)rb_newobj();
n->flags |= T_NODE;
nd_set_type(n, type);
n->u1.value = a0;
n->u2.value = a1;
n->u3.value = a2;
return n;
}
nd_set_type
is just a bit fiddling macro. Now we have memo
as just a three element struct. This use of NODE_DOT2
appears to be a convenient kludge.
The rb_block_call
function appears to be the core internal iterator. And we see our friend id_each
again so we'll be doing an each
iteration. Then we see a choice between zip_i
and zip_ary
; this is where the inner arrays are created and pushed onto result
. The only difference between zip_i
and zip_ary
appears to be the StopIteration exception handling in zip_i
.
At this point we've done the zipping and we either have the array of arrays in result
(if there was no block) or we have Qnil
in result
(if there was a block).
Executive Summary: The first loop explicitly avoids expanding enumerators into arrays. The zip_i
and zip_ary
calls will only work with non-temporary arrays if they have to build an array of arrays as a return value. So, if you call zip
with at least one non-array enumerator and use the block form, then it is enumerators all the way down and the "problem with zip is that it creates arrays internally" does not happen. Reviewing 1.8 or other Ruby implementations is left as an exercise for the reader.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With