Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby - best way to extract regex capture groups?

Tags:

regex

ruby

I was reading a regex group matching question and I see that there are two ways to reference capture groups from a regex expression, namely,

  1. Match string method e.g. string.match(/(^.*)(:)(.*)/i).captures
  2. Perl-esque capture group variables such as $1, $2, etc obtained from if match =~ /(^.*)(:)(.*)/i
  3. Update: As mentioned by 0xCAFEBABE there is a third option too - the last_match method

Which is better? With 1), for safety, you would have to use an if statement to guard against nils so why not just extract the information then? Instead of a second step calling the string captures method. So option 2) looks more convenient to me.

like image 311
Friedrich 'Fred' Clausen Avatar asked Jul 19 '13 10:07

Friedrich 'Fred' Clausen


2 Answers

Since v2.4.6, Ruby has had named_captures, which can be used like this. Just add the ?<some_name> syntax inside a capture group.

/(\w)(\w)/.match("ab").captures # => ["a", "b"]
/(\w)(\w)/.match("ab").named_captures # => {}

/(?<some_name>\w)(\w)/.match("ab").captures # => ["a"]
/(?<some_name>\w)(\w)/.match("ab").named_captures # => {"some_name"=>"a"}

Even more relevant, you can reference a named capture by name!

result = /(?<some_name>\w)(\w)/.match("ab")
result["some_name"] # => "a" 
like image 59
alex Avatar answered Oct 08 '22 18:10

alex


For simple tasks, directly accessing the pseudo variables $1, etc. may be short and easier, but when things get complicated, accessing things via MatchData instances is (nearly) the only way to go.

For example, suppose you are doing nested gsub:

string1.gsub(regex1) do |string2|
  string2.gsub(regex2) do
    ... # Impossible/difficult to refer to match data of outer loop
  end
end

Within the inner loop, suppose you wanted to refer to a captured group of the outer gsub. Calling $1, $2, etc. would not give the right result because the last match data has changed by doing the inner gsub loop. This will be a source of bug.

It is necessary to refer to captured groups via match data:

string1.gsub(regex1) do |string2|
  m1 = $~
  string2.gsub(regex2) do
    m2 = $~
    ... # match data of the outer loop can be accessed via `m1`.
        # match data of the inner loop can be accessed via `m2`.
  end
end

In short, if you want to do short hackish things for simple tasks, you can use the pseudo variables. If you want to keep your code more structured and expandable, you should access data through match data.

like image 4
sawa Avatar answered Oct 08 '22 18:10

sawa