Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.
<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>
If I use the line below
preg_match_all('/<li>(.*)<\/li>/', $text, $result);
i will get an array with a single row containing:
Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4
And by using this code:
preg_match_all('/<li>(.*?)<\/li>/', $text, $result);
i will get an array with 4 row containing Content1, Content2, ...
Why (.*) is not working since it means match any character zero or more times
. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
To perform a substitution, you use the Replace method of the Regex class, instead of the Match method that we've seen in earlier articles. This method is similar to Match, except that it includes an extra string parameter to receive the replacement value.
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.
*
matches in a greedy fashion, *?
matches in a non-greedy fashion.
What this means is that .*
will match as many characters as possible, including all intermediate </li><li>
pairs, stopping only at the last occurrence of </li>
. On the other hand, .*?
will match as few characters as possible, stopping at the first occurrence of </li>
.
Because .*
itself is greedy and eats up as much as it can (i.e. up to the last </li>
) while still allowing the pattern to match. .*?
on the other hand is not greedy and eats up as little as possible (stopping at first </li>
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With