Regex: Is Lazy Worse?

Tags:

I have always written regexes like this

<A HREF="([^"]*)" TARGET="_blank">([^<]*)</A>

but I just learned about this lazy thing and that I can write it like this

<A HREF="(.*?)" TARGET="_blank">(.*?)</A>

is there any disadvantage to using this second approach? The regex is definitely more compact (even SO parses it better).

Edit: There are two best answers here, which point out two important differences between the expressions. ysth's answer points to a weakness in the non-greedy/lazy one, in which the hyperlink itself could possibly include other attributes of the A tag (definitely not good). Rob Kennedy points out a weakness in the greedy example, in that anchor texts cannot include other tags (definitely not okay, because it wouldn't grab all the anchor text either)... so the answer is that, regular expressions being what they are, lazy and non-lazy solutions that seem the same are probably not semantically equivalent.

Edit: Third best answer is by Alan M about relative speed of the expressions. For the time being, I'll mark his as best answer so people give him more points :)

976

asked Dec 14 '08 18:12

Dan Rosenstark

2 Answers

Another thing to consider is how long the target text is, and how much of it is going to be matched by the quantified subexpression. For example, if you were trying to match the whole <BODY> element in a large HTML document, you might be tempted to use this regex:

/<BODY>.*?<\/BODY>/is

But that's going to do a whole lot of unnecessary work, matching one character at a time while effectively doing a negative lookahead before each one. You know the </BODY> tag is going to be very near the end of the document, so the smart thing to do is to use a normal greedy quantitier; let it slurp up the whole rest of the document and then backtrack the few characters necessary to match the end tag.

In most cases you won't notice any speed difference between greedy and reluctant quantifiers, but it's something to keep in mind. The main reason why you should be judicious in your use of reluctant quantifiers is the one that was pointed out by the others: they may do it reluctantly, but they will match more than you want them to if that's what it takes to achieve an overall match.

177

answered Oct 29 '22 20:10

Alan Moore

The complemented character class more rigorously defines what you want to match, so whenever you can, I'd use it.

The non greedy regex will match things you probably don't want, such as:

<A HREF="foo" NAME="foo" TARGET="_blank">foo</A>

where your first .*? matches

foo" NAME="foo

answered Oct 29 '22 19:10

ysth

Related questions
                            
                                python re - split a string before a character
                            
                                Regular Expressions in DB2 SQL
                            
                                ASP.NET MVC client-side e-mail validation doesn't work
                            
                                How to grab number after word in python
                            
                                What is the best regular expression generator/explainer [closed]
                            
                                regular expression to add characters before and after numbers
                            
                                "(?i)" does not work with accents
                            
                                Oracle get substring before a space
                            
                                Notepad ++ How to remove all characters standing before a specific character
                            
                                R: fastest way to extract all substrings contained between two substrings
                            
                                How to match until the last occurrence of a character in bash shell
                            
                                regex: match multiple lines until a line contains
                            
                                Using regular expressions in R to grab numbers from a string
                            
                                How can I clean HTML tags out of a ColdFusion string?
                            
                                Regular expression for checking website url
                            
                                Remove all but valid characters
                            
                                How to remove a word prefix using grep?
                            
                                Multiline regex capture in Scala
                            
                                ElasticSearch and Regex queries
                            
                                Regex to replace 'NO-BREAK SPACE'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex: Is Lazy Worse?

Tags:

regex

regex-greedy

non-greedy

reluctant-quantifiers

Dan Rosenstark

People also ask

2 Answers

Alan Moore

ysth

Recent Activity

Donate For Us