I have a function that replaces anchors' href attribute in a string using Php's DOMDocument. Here's a snippet:
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($text);
$anchors = $doc->getElementsByTagName('a');
foreach($anchors as $a) {
$a->setAttribute('href', 'http://google.com');
}
return $doc->saveHTML();
The problem is that loadHTML($text) surrounds the $text in doctype, html, body, etc. tags. I tried working around this by doing this instead of loadHTML():
$doc = new DOMDocument('1.0', 'UTF-8');
$node = $doc->createTextNode($text);
$doc->appendChild($node);
...
Unfortunately, this encodes all the entities (anchors included). Does anyone know how to turn this off? I've already thoroughly looked through the docs and tried hacking it, but can't figure it out.
Thanks! :)
XML has only very few predefined entities. All you html entities are defined somewhere else. When you use loadhtml() these entity definitions are load automagically, with loadxml() (or no load() at all) they are not.
createTextNode() does exactly what the name suggests. Everything you pass as value is treated as text content, not as markup. I.e. if you pass something that has a special meaning to the markup (<, >, ...) it's encoded in a way a parser can distinguish the text from the actual markup (<, >, ...)
Where does $text come from? Can't you do the replacement within the actual html document?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With