Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UB: C#'s Regex.Match returns whole string instead of part when matching

Attention! This is NOT related to Regex problem, matches the whole string instead of a part


Hi all. I try to do

Match y = Regex.Match(someHebrewContainingLine, @"^.{0,9} - \[(.*)?\s\d{1,3}");

Aside from the other VS hebrew quirks (how do you like replacing ] for [ when editing the string?), it occasionally returns the crazy results:

Match.Captures.Count = 1;
Match.Captures[0] = whole string! (not expected)
Match.Groups.Count = 2; (not expected)
Match.Groups[0] = whole string again! (not expected)
Match.Groups[1] = (.*)? value (expected).

Regex.Matches() is acting same way.

What can be a general reason for such behaviour? Note: it's not acting this way on a simple test strings like Regex.Match("-היי45--", "-(.{1,5})-") (sample is displayed incorrectly!, please look to the page's source code), there must be something with the regex which makes it greedy. The matched string contains [ .... ], but simply adding them to test string doesn't causes the same effect.

like image 255
kagali-san Avatar asked Dec 20 '22 17:12

kagali-san


2 Answers

I hit this problem when I first started using the .NET regex, too. The way to understand this is to understand that the Group member of Match is the nesting member. You have to traverse Groups in order to get down to lower captures. Groups also have Capture members. The Match is kind of like the top "Group" in that it represents the successful "match" of the whole string against your expression. The single input string can have multiple matches. The Captures member represents the match of your full expression.

Whenever you have a single capture as you have, Group[1] will always be the data you are interested in. Look at this page. The source code in examples 2 and 3 is hardcoded to print out Groups[1].

Remember that a single capture can capture multiple substrings in a single match operation. If this were the case then you would see Match.Groups[1].Captures.Count be greater than 1. Also, I think if you passed in multiple matching lines of text to the single Match call, then you would see Match.Captures.Count be greater than 1, but each top-level Match.Captures would be the full string matched by your full expression.

like image 142
Kent Avatar answered Dec 23 '22 07:12

Kent


There is one capture group in the pattern; that is group 1.

There is always group 0, which is the entire match.

Therefore there are a total of 2 groups.

like image 30
MRAB Avatar answered Dec 23 '22 07:12

MRAB