Consider the following regex:
(^.)?
This matches a single character at the start of the string, if possible:
>> 'ab'.match(/(^.)?/)
Array [ "a", "a" ]
However, wrapping the .
in a lookahead causes it to stop working:
>> 'ab'.match(/(^(?=.))?/)
Array [ "", undefined ]
The value of undefined
indicates that the group didn't match, rather than having matched an empty string. But I don't understand how the lookahead prevents the group from matching. I would have expected to get a result of ["", ""]
here.
Even more curiously, this is only the case if the surrounding capture group has a width of 0. If we change the ^
anchor to something longer, it works correctly again:
>> 'ab'.match(/(a(?=.))?/)
Array [ "a", "a" ]
Removing the ?
that makes the group optional fixes the output as well:
>> 'ab'.match(/(^(?=.))/)
Array [ "", "" ]
Can someone explain why this happens? It doesn't make any sense to me.
This doesn’t need to involve lookaheads. Any group that ends up with an empty match and is itself optional won’t match.
> /()/.exec('foo')
['', '']
> /()?/.exec('foo')
['', undefined]
It’s pretty weird, yep.
> /(.*?)/.exec('foo')
['', '']
> /(.*?)?/.exec('foo')
['f', 'f']
There’s a V8 test case that suggests the behaviour is expected. This part of the spec
If min is zero and y's endIndex is equal to x's endIndex, return failure.
seems relevant but is really hard to understand. If it’s actually causing the behaviour here (while trying to avoid having a group match consecutive empty strings?), I’d consider it a spec bug. Other languages don’t behave the same. (Not that they have to, but it’s another strike.)
Actually, the behaviour has been described before with a comment about being explained in the spec, but it’s really not explained at all. (There’s an (a*)*
note with no corresponding output, plus the aforequoted step which is offered without justification except in some other notes about the problem of repeating empty matches which, again, everyone else seems to have solved in the more intuitive way.)
>>> re.match(r'(.*?)?', 'foo').group(0, 1)
('', '')
> Dim m = Regex.Match("foo", "(.*?)?")
> m.Success
True
> m.Length
0
> 'foo' =~ /(.*?)?/
0
> $1
""
> 'foo' =~ /(.*?)?/
('')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With