Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are regular expressions greedy by default?

It seems that this is a huge source of confusion for beginners writing regular expressions, can cause hidden performance problems, and it would seem that a typical use case would be non-greedy.

Is this just for legacy reasons (it was how it was first done, and every implementation copies that), or is there a reason for it?

like image 394
Yishai Avatar asked Feb 16 '10 17:02

Yishai


People also ask

Why is regex greedy?

The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient. Instead of matching till the first occurrence of '>', it extracted the whole string.

Is regex greedy by default?

If you've ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regex are what you need. By default, Perl regular expressions are greedy, meaning they will match as much data as possible before a new line.

How do I stop regex from being greedy?

You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".

What is greedy search in regex?

A greedy match means that the regex engine (the one which tries to find your pattern in the string) matches as many characters as possible. For example, the regex 'a+' will match as many 'a' s as possible in your string 'aaaa' .


2 Answers

Hysterical Raisens


Part of the answer may involve the origins of REs in practical computing. They were originally a theoretical concept from automata theory and formal language theory until Ken Thompson himself wrote a real implementation and used them in qed and ed(1).

The original version had only the greedy syntax and so there wasn't a decision to even make.

like image 169
DigitalRoss Avatar answered Oct 05 '22 23:10

DigitalRoss


In the case of performance, lazy quantifiers aren't always faster because of backtracking: http://blog.stevenlevithan.com/archives/greedy-lazy-performance

As for the actual design, I honestly can't say why quantifiers are greedy by default but I do wonder what control character would have been used to make a quantifier greedy instead of lazy. I don't think ? would have cut it :-)

like image 40
Andy E Avatar answered Oct 05 '22 23:10

Andy E