Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write a regex which matches non greedy? [duplicate]

I need help about regular expression matching with non-greedy option.

The match pattern is:

<img\s.*> 

The text to match is:

<html> <img src="test"> abc <img   src="a" src='a' a=b> </html> 

I test on http://regexpal.com

This expression matches all text from <img to last >. I need it to match with the first encountered > after the initial <img, so here I'd need to get two matches instead of the one that I get.

I tried all combinations of non-greedy ?, with no success.

like image 278
Pointer Null Avatar asked Aug 10 '12 09:08

Pointer Null


People also ask

How do I make a regex not greedy?

discuss just what it means to be greedy. backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.

What is greedy and non-greedy in regex?

It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible. On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.

Is * greedy in regex?

Once the regex engine encounters the first . * , it'll match every character until the end of the input because the star quantifier is greedy. However, the token following the "anything" is a comma, which means that the regex engine has to backtrack until its current position is in front of a comma.

Is regex greedy by default?

Regular expressions aren't greedy by default, but their quantifiers are :-) It seems to me the real question is, why are lazy quantifiers more poorly supported and/or awkward to use than greedy ones?


2 Answers

The non-greedy ? works perfectly fine. It's just that you need to select dot matches all option in the regex engines (regexpal, the engine you used, also has this option) you are testing with. This is because, regex engines generally don't match line breaks when you use .. You need to tell them explicitly that you want to match line-breaks too with .

For example,

<img\s.*?> 

works fine!

Check the results here.

Also, read about how dot behaves in various regex flavours.

like image 115
Pavan Manjunath Avatar answered Oct 11 '22 10:10

Pavan Manjunath


The ? operand makes match non-greedy. E.g. .* is greedy while .*? isn't. So you can use something like <img.*?> to match the whole tag. Or <img[^>]*>.

But remember that the whole set of HTML can't be actually parsed with regular expressions.

like image 30
Ilya Avatar answered Oct 11 '22 11:10

Ilya