I'm reading a page into a variable and I would like to disable all links that do not contain the word "remedy" in the address. The code I have so far grabs all the links including ones with "remedy". What am I doing wrong?
$page = preg_replace('~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i', '<font color="#808080">$1</font>', $page);
-- solution --
$page = preg_replace('~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i', '<font color="#808080">$2</font>', $page);
Try ~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i
To the question, what you are doing wrong: Regexes match ever if anyhow possible and for each url (even that containing remedy
) it is possible to match '~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i'
because you did not specify remedy
may not be contained anywhere in the attribute but you specified there must be anything/nothing (.*?
) that is not followed by remedy
and that is the case for any url except those that begin with exactly <a href="remedy"
. Hope one can understand that...
I would probably use this:
<a href="(?:(?!remedy)[^"])*"[^>]*>([^<]*)</a>
The most interesting part is this:
"(?:(?!remedy)[^"])*"
Each time the [^"]
is about to consume another character, it yields to the lookahead so it confirm that it's not the first character of the word remedy
. Using [^"]
instead of .
prevents it from looking at anything beyond the closing quote. I also took the liberty of replacing your .*?
s with negated character classes. This serves the same purpose, keeping the match "corralled" in the area where you want it to match. It's also more efficient and more robust.
Of course, I'm assuming the <a>
element's content is plain text, with no more elements nested inside it. In fact, that's just one of many simplifying assumptions I've made. You can't match HTML with regexes without them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With