I have two sentences as input. Let's say for example:
<span>I love my red car.</span>
<span>I love my car.</span>
Now I want to match every textpart inside the span-tags AND if available the color.
If I use the following regex:
/<span>(.*?)(?P<color>red)(.*?)<\/span>/ms
Only the line with the color is matched. So I thought let's use ?-operator (for one or zero).
/<span>(.*?)(?P<color>red)?(.*?)<\/span>/ms
Now both lines/sentences will be matched. Sadly the color isn't matched anymore.
The question is why? By using ".*?" before the color part, I thought I had made the regex non-greedy, so that the color part would match, if it's existent. But as told, it doesn't...
The first (.*?)
will match between >
and I
and since it's lazy, it'll test the next part of the regex immediately: (?P<color>red)?
but there's no red
at that point, so the 0
option of ?
'activates' and the regex continues to the next part, which is (.*?)
. It'll again match the part between >
and I
and since it's lazy, it'll check the next part of the regex: <\/span>
(I'm taking it as a whole).
So the second (.*?)
will match all the way there.
Indeed, your results[1]
will be null, as will be results[color]
(I don't remember if you have to quote color
or not) and results[3]
will contain I love my red car.
.
Hmm, one workaround is to use OR like NickC mentioned in his answer. Another you might use is by using a negative lookahead to check for each character:
<span>((?:(?!\bred\b).)*(?<colour>\bred\b)?.*)<\/span>
regex101 demo
As a side note, I would advise using the word boundaries so that you don't match things like reduce
or jarred
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With