preg_replace all links in file_get_contents not containing a word [duplicate]

Question

I'm reading a page into a variable and I would like to disable all links that do not contain the word "remedy" in the address. The code I have so far grabs all the links including ones with "remedy". What am I doing wrong?

$page = preg_replace('~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i', '<font color="#808080">$1</font>', $page);

-- solution --

$page = preg_replace('~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i', '<font color="#808080">$2</font>', $page);

Matmarbon · Accepted Answer

Try ~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i

To the question, what you are doing wrong: Regexes match ever if anyhow possible and for each url (even that containing remedy) it is possible to match '~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i' because you did not specify remedy may not be contained anywhere in the attribute but you specified there must be anything/nothing (.*?) that is not followed by remedy and that is the case for any url except those that begin with exactly <a href="remedy". Hope one can understand that...

Alan Moore · Answer

I would probably use this:

<a href="(?:(?!remedy)[^"])*"[^>]*>([^<]*)</a>

The most interesting part is this:

"(?:(?!remedy)[^"])*"

Each time the [^"] is about to consume another character, it yields to the lookahead so it confirm that it's not the first character of the word remedy. Using [^"] instead of . prevents it from looking at anything beyond the closing quote. I also took the liberty of replacing your .*?s with negated character classes. This serves the same purpose, keeping the match "corralled" in the area where you want it to match. It's also more efficient and more robust.

Of course, I'm assuming the <a> element's content is plain text, with no more elements nested inside it. In fact, that's just one of many simplifying assumptions I've made. You can't match HTML with regexes without them.

preg_replace all links in file_get_contents not containing a word [duplicate]

Tags:

regex

php

preg-replace

negative-lookahead

user2001487

2 Answers

Matmarbon

Alan Moore

Recent Activity

Donate For Us

preg_replace all links in file_get_contents not containing a word [duplicate]

Tags:

regex

php

preg-replace

negative-lookahead

user2001487

2 Answers

Matmarbon

Alan Moore

Related questions

Recent Activity

Donate For Us