It seems that this is a huge source of confusion for beginners writing regular expressions, can cause hidden performance problems, and it would seem that a typical use case would be non-greedy.
Is this just for legacy reasons (it was how it was first done, and every implementation copies that), or is there a reason for it?
The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient. Instead of matching till the first occurrence of '>', it extracted the whole string.
If you've ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regex are what you need. By default, Perl regular expressions are greedy, meaning they will match as much data as possible before a new line.
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
A greedy match means that the regex engine (the one which tries to find your pattern in the string) matches as many characters as possible. For example, the regex 'a+' will match as many 'a' s as possible in your string 'aaaa' .
Part of the answer may involve the origins of REs in practical computing. They were originally a theoretical concept from automata theory and formal language theory until Ken Thompson himself wrote a real implementation and used them in qed and ed(1).
The original version had only the greedy syntax and so there wasn't a decision to even make.
In the case of performance, lazy quantifiers aren't always faster because of backtracking: http://blog.stevenlevithan.com/archives/greedy-lazy-performance
As for the actual design, I honestly can't say why quantifiers are greedy by default but I do wonder what control character would have been used to make a quantifier greedy instead of lazy. I don't think ?
would have cut it :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With