If metacharacter ?
matches the preceding element zero or one time, then
why
"ab".match(/a?/)
returns ["a"]
,
but
"ab".match(/b?/)
returns [""]
?
Because that's the first match. The regex tries at first to match at position 0, where regex#1 does match an a
and regex#2 does match the empty string. Then it attempts to match at position 1, where regex#1 does match the empty string and regex#2 does match the letter b
. At last, it tries to match at position 3, where both regexes match the empty string.
Compare the returned matches with a global flag:
> "ab".match(/a?/)
["a"]
> "ab".match(/a?/g)
["a", "", ""]
> "ab".match(/b?/)
[""]
> "ab".match(/b?/g)
["", "b", ""]
why not [""] is returned in first case?
Due to the mechanisms of backtracking. When attempting to match at some position, the engine will try to greedily1 test all letters of the regex against the letters of the string. When it reaches the end of the regex with that method, a match succeeded. When a letter doesn't fit in, it tries to go back in the regex to see whether any omissions can be made - when using modifiers such as *
or ?
- or alternatives (|
) need to be considered, and then continues from there.
Example: Match /b?/
at position 0 of "ab"
:
// - "": ✓ /b/ - "a": × /b?/ - "": ✓ - succeed (end of regex) ^ means here that the "b" token is omitted
Example: Match /a?/
at position 0 of "ab"
:
// - "": ✓ /a/ - "a": ✓ - succeed (end of regex)
Example: Match /ab?(bc)?/
at position 0 of "abc"
// - "": ✓ /a/ - "a": ✓ /ab/ - "ab": ✓ /ab(b)/ - "abc": × /ab(bc)?/ - "ab": ✓ - succeed (end of regex)
1: Usually, at least. Many regex flavours also provide quantifiers that are lazy or possessive if you want to control the exact matching behaviour. For example, /ab??(bc)?/
matches abc
in "abc"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With