Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Regular expression: exclude href anchor tags

I'm creating a simple search for my application.

I'm using PHP regular expression replacement (preg_replace) to look for a search term (case insensitive) and add <strong> tags around the search term.

preg_replace('/'.$query.'/i', '<strong>$0</strong>', $content);

Now I'm not the greatest with regular expressions. So what would I add to the regular expression to not replace search terms that are in a href of an anchor tag?

That way if someone searched "info" it wouldn't change a link to "http://something.com/this_<strong>info</strong>/index.html"

like image 321
floatleft Avatar asked Apr 23 '11 22:04

floatleft


1 Answers

I believe you will need conditional subpatterns] for this purpose:

$query = "link";
$query = preg_quote($query, '/');

$p = '/((<)(?(2)[^>]*>)(?:.*?))*?(' . $query . ')/smi';
$r = "$1<strong>$3</strong>";

$str = '<a href="/Link/foo/the_link.htm">'."\n".'A Link</a>'; // multi-line text
$nstr = preg_replace($p, $r,  $str);
var_dump( $nstr );

$str = 'Its not a Link'; // non-link text
$nstr = preg_replace($p, $r,  $str);
var_dump( $nstr );

Output: (view source)

string(61) "<a href="/Link/foo/the_link.htm"> 
A <strong>Link</strong></a>"
string(31) "Its not a <strong>Link</strong>"

PS: Above regex also takes care of multi-line replacement and more importantly it ignores matching not only href but any other HTML entity enclosed in < and >.

EDIT: If you just want to exclude hrefs and not all html entities then use this pattern instead of above in my answer:

$p = '/((<)(?(2).*?href=[^>]*>)(?:.*?))*?(' . $query . ')/smi';
like image 171
anubhava Avatar answered Oct 11 '22 10:10

anubhava