I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:
<a href="example.com" alt="yasar home page">yasar</a>
So that yasar
reads <span class="selected-word">yasar</span>
, my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace()
I am using looks like this:
preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);
How can I make a regular expression, so that it doesn't match anything inside a html tag?
You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >
, or before any <
. The latter test is easier to accomplish as lookahead assertions can be variable length:
/(asf|foo|barr)(?=[^>]*(<|$))/
See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.
Yasar, resurrecting this question because it had another solution that wasn't mentioned.
Instead of just checking that the next tag character is an opening tag, this solution skips all <full tags>
.
With all the disclaimers about using regex to parse html, here is the regex:
<[^>]*>(*SKIP)(*F)|word1|word2|word3
Here is a demo. In code, it looks like this:
$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);
Here is an online demo of this code.
Reference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With