Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are these regex patterns different?

A website I've been working on will not match data using a PHP (preg_match) regex pattern that seems to work everywhere else I've tested it. That pattern is:

<channel.*?>(.*?)</channel>

It is matched against an RSS feed that has a channel tag.

Now the server I am working on will only produce the correct result if change it to:

<channel.*?>(.*)?</channel>

My regex isn't the best in the world so I'm wondering if anyone can tell me if there is any significant difference between the two patterns.

Small note: I realize it would probably be better to use SimpleXML etc, but this regex is from a previous application and for various reasons I am not allowed to change it.

Thanks in advance for any insights.

like image 389
Vunus Avatar asked Jun 21 '12 13:06

Vunus


1 Answers

The statement (.*) says "the selection is zero or more characters" and the trailing ? makes it an optional match. By contrast, (.*?) is using a "lazy star" ( *? ) which first attempts to skip the match completely. Check this for more information.

To understand the difference between a normal (greedy) star and a lazy star, look at the following example in PHP and notice that the greedy star makes the largest match it can with the pattern it is given, while the lazy star "gives up" as soon as it has satisfied the match pattern:

$inputs = array( 'axb' , 'axxxb' , 'axbxb' , 'axbxxxb' );

// GREEDY STAR (NORMAL)
foreach( $inputs as $input )
{
  preg_match( '/a.*b/' , $input , $greedy );
  $greedy_matches[] = $greedy[0];
}

print "<pre>";
print_r( $greedy_matches );
print "</pre>";
/* 
Array
(
    [0] => axb
    [1] => axxxb
    [2] => axbxb
    [3] => axbxxxb
)
*/



// LAZY STAR
foreach( $inputs as $input )
{
  preg_match( '/a.*?b/' , $input , $lazy );
  $lazy_matches[] = $lazy[0];
}

print "<pre>";
print_r( $lazy_matches );
print "</pre>";
/* 
Array
(
    [0] => axb
    [1] => axxxb
    [2] => axb
    [3] => axb
)
*/
like image 80
Andrew Kozak Avatar answered Sep 22 '22 08:09

Andrew Kozak