I am trying to remove certain links depending on their ID tag, but leave the content of the link. For example I want to turn
Some text goes <a href="http://www.domain.tdl/" id="remove">here</a>
to
Some text goes here
I have tried using the below.
$dom = new DOMDocument;
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xp = new DOMXPath($dom);
foreach($xp->query('//a[contains(@id="remove")]') as $oldNode) {
$revised = strip_tags($oldNode);
}
$revised = mb_substr($dom->saveXML($xp->query('//body')->item(0)), 6, -7, "UTF-8");
echo $revised;
roughly taken from here but it just spits back the same content of $html
.
Any idea's on how I would achieve this?
That's my function for that:
function DOMRemove(DOMNode $from) {
$sibling = $from->firstChild;
do {
$next = $sibling->nextSibling;
$from->parentNode->insertBefore($sibling, $from);
} while ($sibling = $next);
$from->parentNode->removeChild($from);
}
So this:
$dom->loadHTML('Hello <a href="foo"><span>World</span></a>');
$a = $dom->getElementsByTagName('a')->item(0); // get first
DOMRemove($a);
Should give you:
Hello <span>World</span>
To get nodes with a specific ID, use XPath:
$xpath = new DOMXpath($dom);
$node = $xpath->query('//a[@id="something"]')->item(0); // get first
DOMRemove($node);
An approach similar to @netcoder's answer but using a different loop structure and DOMElement methods.
$html = '<html><body>This <a href="http://www.domain.tdl/" id="remove">link</a> was removed.</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[@id="remove"]') as $link) {
// Move all link tag content to its parent node just before it.
while($link->hasChildNodes()) {
$child = $link->removeChild($link->firstChild);
$link->parentNode->insertBefore($child, $link);
}
// Remove the link tag.
$link->parentNode->removeChild($link);
}
$html = $dom->saveXML();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With