I can't seem to find an answer to this problem, and I'm wondering if one exists. Simplified example: Consider a string "nnnn", where I want to find all matches of "nn" - but also those that overlap with each other. So the regex would provide the following 3 matches: <ol> <li> nnnn</li> <li>nnnn</li> <li>nnnn </li> </ol> I realize this is not exactly what regexes are meant for, but walking the string and parsing this manually seems like an awful lot of code, considering that in reality the matches would have to be done using a pattern, not a literal string.

Update 2016: To get <code>nn</code>, <code>nn</code>, <code>nn</code>, SDJMcHattie proposes in the comments <code>(?=(nn))</code> (see regex101). <pre class="prettyprint"><code>(?=(nn)) </code></pre> <hr> Original answer (2008) A possible solution could be to use a positive look behind: <pre class="prettyprint"><code>(?<=n)n </code></pre> It would give you the end position of: <ol> <li> nnnn </li> <li>nnnn </li> <li>nnnn </li> </ol> <hr> As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example) I would prefer to his proposition <code>(?=nn)n</code> the simpler form: <pre class="prettyprint"><code>(n)(?=(n)) </code></pre> That would reference the first position of the strings you want and would capture the second n in group(2). That is so because: <ul> <li>Any valid regular expression can be used inside the lookahead.</li> <li>If it contains capturing parentheses, the backreferences will be saved.</li> </ul> So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex). <hr>

Using a lookahead with a capturing group works, at the expense of making your regex slower and more complicated. An alternative solution is to tell the Regex.Match() method where the next match attempt should begin. Try this: <pre class="prettyprint"><code>Regex regexObj = new Regex("nn"); Match matchObj = regexObj.Match(subjectString); while (matchObj.Success) { matchObj = regexObj.Match(subjectString, matchObj.Index + 1); } </code></pre>

AFAIK, there is no pure regex way to do that at once (ie. returning the three captures you request without loop). Now, you can find a pattern once, and loop on the search starting with offset (found position + 1). Should combine regex use with simple code. [EDIT] Great, I am downvoted when I basically said what Jan shown... [EDIT 2] To be clear: Jan's answer is better. Not more precise, but certainly more detailed, it deserves to be chosen. I just don't understand why mine is downvoted, since I still see nothing incorrect in it. Not a big deal, just annoying.

Overlapping matches in Regex

Tags:

c#

regex

overlap

I can't seem to find an answer to this problem, and I'm wondering if one exists. Simplified example:

Consider a string "nnnn", where I want to find all matches of "nn" - but also those that overlap with each other. So the regex would provide the following 3 matches:

nnnn
nnnn
nnnn

I realize this is not exactly what regexes are meant for, but walking the string and parsing this manually seems like an awful lot of code, considering that in reality the matches would have to be done using a pattern, not a literal string.

651

asked Nov 26 '08 11:11

jevakallio

3 Answers

Update 2016:

To get nn, nn, nn, SDJMcHattie proposes in the comments (?=(nn)) (see regex101).

Click to copy

(?=(nn))

Original answer (2008)

A possible solution could be to use a positive look behind:

Click to copy

(?<=n)n

It would give you the end position of:

nnnn
nnnn
nnnn

As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)

I would prefer to his proposition (?=nn)n the simpler form:

Click to copy

(n)(?=(n))

That would reference the first position of the strings you want and would capture the second n in group(2).

That is so because:

Any valid regular expression can be used inside the lookahead.
If it contains capturing parentheses, the backreferences will be saved.

So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).

112

answered Nov 13 '22 22:11

VonC

Using a lookahead with a capturing group works, at the expense of making your regex slower and more complicated. An alternative solution is to tell the Regex.Match() method where the next match attempt should begin. Try this:

Click to copy

Regex regexObj = new Regex("nn");
Match matchObj = regexObj.Match(subjectString);
while (matchObj.Success) {
    matchObj = regexObj.Match(subjectString, matchObj.Index + 1); 
}

answered Nov 13 '22 20:11

Jan Goyvaerts

AFAIK, there is no pure regex way to do that at once (ie. returning the three captures you request without loop).

Now, you can find a pattern once, and loop on the search starting with offset (found position + 1). Should combine regex use with simple code.

[EDIT] Great, I am downvoted when I basically said what Jan shown...
[EDIT 2] To be clear: Jan's answer is better. Not more precise, but certainly more detailed, it deserves to be chosen. I just don't understand why mine is downvoted, since I still see nothing incorrect in it. Not a big deal, just annoying.

answered Nov 13 '22 21:11

PhiLho

Related questions
                            
                                Interface with generic parameter vs Interface with generic methods
                            
                                How to globally set default options for System.Text.Json.JsonSerializer?
                            
                                How to (quickly) check if UNC Path is available
                            
                                What is the memory overhead of a .NET Object
                            
                                Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE))
                            
                                Is there a logging facade for the .NET world?
                            
                                How can I display a system tray icon for C# window service.?
                            
                                Why does System.Type.GetHashCode return the same value for all instances and types?
                            
                                phone gap vs monotouch for data intensive app
                            
                                Mysterious "Not enough quota is available to process this command" in WinRT port of DataGrid
                            
                                How do you get XML comments to appear in a different project (dll)?
                            
                                In C#, Is Expression API better than Reflection
                            
                                Is it better to execute many sql commands with one connection, or reconnect every time?
                            
                                Visual studio doesn't support specific csproj file
                            
                                Is System.Web.Caching or System.Runtime.Caching preferable for a .NET 4 web application
                            
                                Why do all TryParse overloads have an out parameter? [closed]
                            
                                .NET Framework: Random number generator produces repeating pattern
                            
                                What's the most widely-used logging framework in C#? [closed]
                            
                                Convert C# DateTime to Javascript Date
                            
                                Pass in an enum as a method parameter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Overlapping matches in Regex

Tags:

c#

regex

overlap

jevakallio

People also ask

3 Answers

VonC

Jan Goyvaerts

PhiLho

Recent Activity

Donate For Us