Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(.*) instead of (.*?)

Tags:

regex

php

Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.

<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>

If I use the line below

preg_match_all('/<li>(.*)<\/li>/', $text, $result);

i will get an array with a single row containing:

Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4

And by using this code:

preg_match_all('/<li>(.*?)<\/li>/', $text, $result);

i will get an array with 4 row containing Content1, Content2, ...

Why (.*) is not working since it means match any character zero or more times

like image 785
EBAG Avatar asked Apr 07 '10 11:04

EBAG


People also ask

What does .*) Mean in regex?

. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

How do you substitute in regex?

To perform a substitution, you use the Replace method of the Regex class, instead of the Match method that we've seen in earlier articles. This method is similar to Match, except that it includes an extra string parameter to receive the replacement value.

Are Regexes greedy?

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.


2 Answers

* matches in a greedy fashion, *? matches in a non-greedy fashion.

What this means is that .* will match as many characters as possible, including all intermediate </li><li> pairs, stopping only at the last occurrence of </li>. On the other hand, .*? will match as few characters as possible, stopping at the first occurrence of </li>.

like image 171
Thomas Avatar answered Sep 17 '22 15:09

Thomas


Because .* itself is greedy and eats up as much as it can (i.e. up to the last </li>) while still allowing the pattern to match. .*? on the other hand is not greedy and eats up as little as possible (stopping at first </li>).

like image 23
Matteo Riva Avatar answered Sep 18 '22 15:09

Matteo Riva