Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make one or zero regex operator greedy

I have two sentences as input. Let's say for example:

<span>I love my red car.</span>
<span>I love my car.</span>

Now I want to match every textpart inside the span-tags AND if available the color.

If I use the following regex:

/<span>(.*?)(?P<color>red)(.*?)<\/span>/ms

Only the line with the color is matched. So I thought let's use ?-operator (for one or zero).

/<span>(.*?)(?P<color>red)?(.*?)<\/span>/ms

Now both lines/sentences will be matched. Sadly the color isn't matched anymore.

The question is why? By using ".*?" before the color part, I thought I had made the regex non-greedy, so that the color part would match, if it's existent. But as told, it doesn't...

like image 660
netblognet Avatar asked Sep 18 '13 07:09

netblognet


1 Answers

The first (.*?) will match between > and I and since it's lazy, it'll test the next part of the regex immediately: (?P<color>red)? but there's no red at that point, so the 0 option of ? 'activates' and the regex continues to the next part, which is (.*?). It'll again match the part between > and I and since it's lazy, it'll check the next part of the regex: <\/span> (I'm taking it as a whole).

So the second (.*?) will match all the way there.

Indeed, your results[1] will be null, as will be results[color] (I don't remember if you have to quote color or not) and results[3] will contain I love my red car..

Hmm, one workaround is to use OR like NickC mentioned in his answer. Another you might use is by using a negative lookahead to check for each character:

<span>((?:(?!\bred\b).)*(?<colour>\bred\b)?.*)<\/span>

regex101 demo

As a side note, I would advise using the word boundaries so that you don't match things like reduce or jarred.

like image 64
Jerry Avatar answered Sep 29 '22 13:09

Jerry