Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby regex scan versus =~

Tags:

regex

ruby

The Ruby (1.9.3) documentation seems to imply that scan is equivalent to =~ except that

  1. scan returns multiple matches, while =~ returns only the first occurrence, and
  2. scan returns the match data, while =~ returns the index.

However, in the following example, the two methods seem to return different results for the same string and expression. Why is that?

1.9.3p0 :002 > str = "Perl and Python - the two languages"
 => "Perl and Python - the two languages" 
1.9.3p0 :008 > exp = /P(erl|ython)/
 => /P(erl|ython)/ 
1.9.3p0 :009 > str =~ exp
 => 0 
1.9.3p0 :010 > str.scan exp
 => [["erl"], ["ython"]] 

If the index of first match is 0, shouldn't scan return "Perl" and "Python" instead of "erl" and "python"?

Thanks

like image 892
Anand Avatar asked Apr 24 '12 03:04

Anand


People also ask

What does =~ mean in Ruby regex?

=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.

What does Ruby scan do?

StringScanner#scan() : scan() is a StringScanner class method which tries to match with pattern at the current position.

What kind of regex does Ruby use?

A regular expression is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings. Ruby regular expressions i.e. Ruby regex for short, helps us to find particular patterns inside a string. Two uses of ruby regex are Validation and Parsing.

What method should you use when you want to get all sequences matching a Regexp pattern in a string?

To find all the matching strings, use String's scan method.


1 Answers

When given a regular expression without capturing groups, scan will return an array of strings, where each string represents a match of the regular expression. If you use scan(/P(?:erl|ython)/) (which is the same as your regex except without capturing groups), you'll get ["Perl", "Python"], which is what you expect.

However when given a regex with capturing groups, scan will return an array of arrays, where each sub-array contains the captures of a given match. So if you have for example the regex (\w*):(\w*), you'll get an array of arrays where each sub-array contains two strings: the part before the colon and the part after the colon. And in your example each sub-array contains one string: the part matched by (erl|ython).

like image 180
sepp2k Avatar answered Nov 07 '22 03:11

sepp2k