Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array.reject!, how does it work exactly?

I've made a very small ruby script today that makes use of regexes to track down certain content in files with a specific name and removes that content before adding its replacement. (Or else stuff would go wrong during iterations).

I'm not very used to ruby (only been using it since my vacation job started which is 1-2 weeks ago), but one of my habits is to avoid touching lists (or most other ADT's using indexes) while iterating over them (to remove the certain content), doesn't matter which language I'm using.

After some searching I found out about a few Array functions that could help. Right now, I'm using Array.reject! and the script works like I want it to work, but I honestly can't figure out why Array.reject! {|line| line =~ regex } does not have trouble with skipping objects in the array. These sources, ruby-docs & some random website, confirm that the changes are applied instantly while iterating, which makes me wonder how it does not mess up... The lines that are being removed have no space/words between them, only \n brings the next one to its own line of course (but that's just part of the end of the strings).

Has anyone got a great explanation for this?

like image 457
olivier.va Avatar asked Jul 13 '12 09:07

olivier.va


2 Answers

Array#reject! uses a for loop to iterate over the array's elements. Here's the C code:

for (i = 0; i < RARRAY_LEN(ary); ) {
  VALUE v = RARRAY_PTR(ary)[i];
  if (RTEST(rb_yield(v))) {
    rb_ary_delete_at(ary, i);
    result = ary;
  } 
  else {
    i++;
  }
}

The interesting part is that i is not incremented in the for statement. If the block given to reject! evaluates to true the current element is removed and ary[i] automatically points to the next element. Only if it evaluates to false, i is incremented.

[a b c d].reject! {|x| x == b}

 0 <------- i # doesn't match => i++
[a b c d]

   1 <----- i # matches => delete ary[i]
[a b c d]

   1 <----- i # doesn't match => i++
[a c d]

     2 <--- i # doesn't match => finished
[a c d]
like image 88
Stefan Avatar answered Nov 24 '22 23:11

Stefan


Here's the source code for ary_reject_bang, the heart of the C implementation of reject!.

static VALUE
ary_reject_bang(VALUE ary)
{
    long i;
    VALUE result = Qnil;

    rb_ary_modify_check(ary);
    for (i = 0; i < RARRAY_LEN(ary); ) {
        VALUE v = RARRAY_PTR(ary)[i];
        if (RTEST(rb_yield(v))) {
            rb_ary_delete_at(ary, i);
            result = ary;
        }
        else {
            i++;
        }
    }
    return result;
}

RARRAY_PTR is a macro defined in ruby.h, that gives you access to the underlying C array of a Ruby array. The actual removal is done with rb_ary_delete_at, which uses some other macros to keep the array in order:

VALUE
rb_ary_delete_at(VALUE ary, long pos)
{
    long len = RARRAY_LEN(ary);
    VALUE del;

    if (pos >= len) return Qnil;
    if (pos < 0) {
        pos += len;
        if (pos < 0) return Qnil;
    }

    rb_ary_modify(ary);
    del = RARRAY_PTR(ary)[pos];
    MEMMOVE(RARRAY_PTR(ary)+pos, RARRAY_PTR(ary)+pos+1, VALUE,
        RARRAY_LEN(ary)-pos-1);
    ARY_INCREASE_LEN(ary, -1);

    return del;
}
like image 39
Michael Kohl Avatar answered Nov 24 '22 23:11

Michael Kohl