Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression .*? vs .*

Tags:

regex

php

I came across a php article about regular expressions which used (.*?) in its syntax. As far I can see it behaves just like (.*)

Is there any advantage of using (.*?) ? I can't really see why someone would use that.

like image 978
Tiddo Avatar asked Dec 07 '22 23:12

Tiddo


1 Answers

in most flavours of regex, the *? production is a non-greedy repeat. This means that the .*? production matches first the empty string, and then if that fails, one character, and so on until the match succeeds. In contrast, the greedy production .* first attempts to match the entire input, and then if that fails, tries one character less.

This concept only applies to regular expression engines that use recursive backtracking to match ambiguous expressions. In theory, they match exactly the same sentances, but since they try different things first, it's likely that one will be much quicker than the other.

This can also be useful when capture groups (in recursive and NFA style engines equally) are used to extract information from the matching action. For instance, an expression like

"(.*?)"

can be used to capture a quoted string. Since the subgroup is non-greedy, you can be sure that no quotes will be captured, and the subgroup contains only the desired content.

like image 191
SingleNegationElimination Avatar answered Dec 10 '22 11:12

SingleNegationElimination