Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp#match returns only one match

Tags:

regex

ruby

pcre

Explain me, please, why match() returns only one match, instead of four (for example):

s = 'aaaa'
p /a/.match(s).to_a # => ["a"]

It is stranger that with grouping match() return two matches, independently of real matches count:

s = 'aaaa'
p /(a)/.match(s).to_a # => ["a", "a"]

s = 'a aaa a'
p /(a)/.match(s).to_a # => ["a", "a"]

Thank for yours answers.

like image 447
shau-kote Avatar asked Oct 15 '13 05:10

shau-kote


2 Answers

You need to use .scan() to match more than once:

p s.scan(/a/).to_a

And with grouping, you get one result for the overall match, and one for each group (when using .match(). Both results are the same in your regex.

Some examples:

> /(a)/.matc­h(s).to_a
=> ["a", "a"]           # First: Group 0 (overall match), second: Group 1
> /(a)+/.mat­ch(s).to_a­
=> ["aaaa", "a"]        # Regex matches entire string, group 1 matches the last a
> s.scan(/a/­).to_a
=> ["a", "a", "a", "a"] # Four matches, no groups
> s.scan(/(a­)/).to_a
=> [["a"], ["a"], ["a"], ["a"]] # Four matches, each containing one group
> s.scan(/(a­)+/).to_a
=> [["a"]]              # One match, the last match of group 1 is retained
> s.scan(/(a­+)(a)/).to­_a
=> [["aaa", "a"]]       # First group matches aaa, second group matches final a
> s.scan(/(a­)(a)/).to_­a
=> [["a", "a"], ["a", "a"]] # Two matches, both group participate once per match
like image 116
Tim Pietzcker Avatar answered Oct 16 '22 09:10

Tim Pietzcker


By feature, match only matches once. A single match corresponds to a MatchData instance, and MatchData#to_a returns an array where the 0th element is the whole match, and the other n-th elements are the n-th captures, respectively. A capture is whatever that matches inside (). If you do not have any () in the regex, then the array would only have the whole match.

The reason there is more than one "a" in ["a", "a"] with /(a)/ is because a single match has a capture in addition to the whole match: The first "a" represents the whole match, corresponding to /(a)/, and the second "a" represents the first capture, corresponding to the a inside (a).

If you wanted to match arbitrary many matches, use scan.

like image 3
sawa Avatar answered Oct 16 '22 08:10

sawa