Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting overlapping matches with Regex in C# [duplicate]

Tags:

c#

regex

The following code evaluates 2 instead of 4:

Regex.Matches("020202020", "020").Count;

I'm guessing the regex starts looking for the next match from the end of the previous match. Is there any way to prevent this. I have a string of '0's and '2's and I'm trying to count how many times I have three '2's in a row, four '2's in a row etc.

like image 596
KristjanJonsson Avatar asked Aug 13 '12 22:08

KristjanJonsson


3 Answers

This will return 4 as you expect:

Regex.Matches("020202020", @"0(?=20)").Count;

The lookahead matches the 20 without consuming it, so the next match attempt starts at the position following the first 0. You can even do the whole regex as a lookahead:

Regex.Matches("020202020", @"(?=020)").Count;

The regex engine automatically bumps ahead one position each time it makes a zero-length match. So, to find all runs of three 2's or four 2's, you can use:

Regex.Matches("22222222", @"(?=222)").Count;  // 6

...and:

Regex.Matches("22222222", @"(?=2222)").Count;  // 5

EDIT: Looking over your question again, it occurs to me you might be looking for 2's interspersed with 0's

Regex.Matches("020202020", @"(?=20202)").Count;  // 2

If you don't know how many 0's there will be, you can use this:

Regex.Matches("020202020", @"(?=20*20*2)").Count;  // 2

And of course, you can use quantifiers to reduce repetition in the regex:

Regex.Matches("020202020", @"(?=2(?:0*2){2})").Count;  // 2
like image 154
Alan Moore Avatar answered Oct 03 '22 10:10

Alan Moore


Indeed, a regular expression will continue from where the last one ended. You can work around it by using lookahead patterns. I'm not a .NET guy, but try this: "(?=020)." Translation: "find me any single character, where this character and the next two characters are 020". The trick is that the match is only one character wide, not three, so you will get all the matches in the string, even if they overlap.

(you could also write it as "0(?=20)", but that's less clear to humans at least :p )

like image 44
Amadan Avatar answered Oct 03 '22 08:10

Amadan


Try this, using zero-width positive lookbehind:

Regex.Matches("020202020",@"(?<=020)").Count;

Worked for me, yields 4 matches.

My favorite reference for Regex: Regular Expression Language - Quick Reference Also a quick way to try out your Regex, I use it quite often for complex Regex: Free Regular Expression Designer

like image 31
crlanglois Avatar answered Oct 03 '22 08:10

crlanglois