I can't seem to find an answer to this problem, and I'm wondering if one exists. Simplified example:
Consider a string "nnnn", where I want to find all matches of "nn" - but also those that overlap with each other. So the regex would provide the following 3 matches:
I realize this is not exactly what regexes are meant for, but walking the string and parsing this manually seems like an awful lot of code, considering that in reality the matches would have to be done using a pattern, not a literal string.
Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .
“Non-overlapping” means that the string is searched through from left to right, and the next match attempt starts beyond the previous match. If the regex contains one or more capturing groups, re. findall() returns an array of tuples, with each tuple containing text matched by all the capturing groups.
By default, the match is case sensitive. Example : [^abc] will match any character except a,b,c . [first-last] – Character range: Matches any single character in the range from first to last.
Update 2016:
To get nn
, nn
, nn
, SDJMcHattie proposes in the comments (?=(nn))
(see regex101).
(?=(nn))
Original answer (2008)
A possible solution could be to use a positive look behind:
(?<=n)n
It would give you the end position of:
As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)
I would prefer to his proposition (?=nn)n
the simpler form:
(n)(?=(n))
That would reference the first position of the strings you want and would capture the second n in group(2).
That is so because:
So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).
Using a lookahead with a capturing group works, at the expense of making your regex slower and more complicated. An alternative solution is to tell the Regex.Match() method where the next match attempt should begin. Try this:
Regex regexObj = new Regex("nn");
Match matchObj = regexObj.Match(subjectString);
while (matchObj.Success) {
matchObj = regexObj.Match(subjectString, matchObj.Index + 1);
}
AFAIK, there is no pure regex way to do that at once (ie. returning the three captures you request without loop).
Now, you can find a pattern once, and loop on the search starting with offset (found position + 1). Should combine regex use with simple code.
[EDIT] Great, I am downvoted when I basically said what Jan shown...
[EDIT 2] To be clear: Jan's answer is better. Not more precise, but certainly more detailed, it deserves to be chosen. I just don't understand why mine is downvoted, since I still see nothing incorrect in it. Not a big deal, just annoying.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With