In trying to elaborate an answer to this question, I am now trying to come to terms with the behavior/meaning of Zero-Length regular expressions.
I often use www.regexr.com as a playground to test/debug/understand what's going on in regular expressions.
So we have this most banal scenario:
The regex is a*
The input string is dgwawa
(As a matter of fact, the string here is irrelevant)
Why this behavior of reporting that this regex will match infinitely, since it matches zero occurrences of the preceding character ?
Why can't the result be 6 matches, one for each character position (since at every character, regardless of whether it is an a or not, there is a match, since zero matches is a match)?
How does it get into matching infinitely ? So it does not check/progress a character at a time?
I wonder how/where does it get itself into an infinite loop.
A zero-length match can occur in a several cases: in an empty input string, at the beginning of an input string, after the last character of an input string, or in between any two characters of an input string. Zero-length matches are easily identifiable because they always start and end at the same index position.
The * quantifier matches the preceding element zero or more times. It's equivalent to the {0,} quantifier.
+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.
regex matches everything for empty string ("") as a pattern #896.
You selected JavaScript regex flavor at regexr.com online regex tester. JavaScript regex engine does not move the index automatically when a pattern that can match an empty string is passed.
That is why when you need to emulate the behavior observed in .NET Regex.Matches
, PHP preg_match_all
, Python re.finditer
, etc. you need to manually advance the index to test each position.
See regex101.com test:
var re = /a*/g;
var str = 'dgwawa';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) { // <- this part
re.lastIndex++; // <- here
} // <- is important
document.body.innerHTML += "'" + m[0] + "'<br/>";
}
If you remove that if
block, you will get an infinite loop.
There are two very important things to mention with this regard:
There are actually 7 matches
Let me enumerate them, first number is the start (0 based), second number is the length
Match 1: 0 0
Match 2: 1 0
Match 3: 2 0
Match 4: a 3 1
Match 5: 4 0
Match 6: a 5 1
Match 7: 6 0
I use regex101 and it does what most of us expect from this simple regex (given there are regex dialects).
https://regex101.com/r/mN4jA4/1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With