Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Odd regexp behaviour - matches only first and last capture group

I am trying to write a regexp which would match a comma separated list of words and capture all words. This line should be matched    apple , banana ,orange,peanut  and captures should be apple, banana, orange, peanut. To do that I use following regexp:

^\s*([a-z_]\w*)(?:\s*,\s*([a-z_]\w*))*\s*$

It successfully matches the string but all of a sudden only apple and peanut are captured. This behaviour is seen in both C# and Perl. Thus I assume I am missing something about how regexp matching works. Any ideas? :)

like image 574
bazzilic Avatar asked Nov 19 '12 08:11

bazzilic


2 Answers

The value given by match.Groups[2].Value is just the last value captured by the second group.

To find all the values, look at match.Groups[2].Captures[i].Value where in this case i ranges from 0 to 2. (As well as match.Groups[1].Value for the first group.)

(+1 for question, I learned something today!)

like image 129
Rawling Avatar answered Sep 24 '22 15:09

Rawling


Try this:

string text = "   apple , banana ,orange,peanut";

var matches = Regex.Matches(text, @"\s*(?<word>\w+)\s*,?")
        .Cast<Match>()
        .Select(x => x.Groups["word"].Value)
        .ToList();
like image 3
Rui Jarimba Avatar answered Sep 22 '22 15:09

Rui Jarimba