Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Koans: regex parentheses "capture" matched content?

Tags:

ruby

I'm going through about_regular_expressions.rb and don't understand exactly what's happening here:

def test_variables_can_also_be_used_to_access_captures
    assert_equal "Gray, James", "Name:  Gray, James"[/(\w+), (\w+)/]
    assert_equal "Gray", $1
    assert_equal "James", $2
end

It seems to me like the use of the parentheses in the regular expression creates two new variables under the hood ($1 and $2).

Is this correct?

But then I did this:

def test_variables_can_also_be_used_to_access_captures
    assert_equal "Gray, James", "Name:  Gray, James"[/(\w+), (\w+)/]
    assert_equal "Smith, Bobert", "Name:  Smith, Bobert"[/(\w+), (\w+)/]
    assert_equal "Smith", $1
    assert_equal "Bobert", $2
end

And it captured "Smith" and "Bobert". I guess the previous values were just overwritten each time a new regex with parentheses is used?

If I then try to capture just one word:

def test_variables_can_also_be_used_to_access_captures
    assert_equal "Gray, James", "Name:  Gray, James"[/(\w+), (\w+)/]
    assert_equal "Smith, Bobert", "Name:  Smith, Bobert"[/(\w+), (\w+)/]
    assert_equal "Smith", $1
    assert_equal "Bobert", $2
    assert_equal "Susan,", "Name:  Susan, whatever"[/(\w+),/]
    assert_equal "Susan", $1
    assert_equal nil, $2
end

$2 is gone... (no more "Bobert")

Can anyone shed some light about what happens under the hood? Or point me in the right direction?

like image 749
Robert Avatar asked Oct 18 '12 01:10

Robert


1 Answers

You are right. Every time a regex is matched, the global variables $~, $&, ..., $1, $2, ... are overwritten. In your last example, the regex does not have anything to match for $2 because it does not have a second (...) position, so nil was assigned for $2.

When you want to interleavingly use the results from multiple matches, the technique I use is to keep the match data as variables. That is, immediately after first regex match, assign a variable match1 = $~. Then, go on to the next regex match and do match2 = $~, and so on. Later, you can extract the matched results from these variables. For example, after doing several regex matches, if you wanted to refer back to the result of the $1 that was assigned at the first regex match, you can call it by match1[1], etc.

like image 52
sawa Avatar answered Oct 09 '22 16:10

sawa