Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove DOM element tags but leave their contents?

Tags:

html

dom

php

xpath

I have PHP code which removes all nodes that have at least one attribute. Here is my code:

<?php

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
</div>
DATA;

$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);

$xpath = new DOMXPath($dom);

$lines_to_be_removed = $xpath->query("//*[count(@*)>0]");

foreach ($lines_to_be_removed as $line) {
    $line->parentNode->removeChild($line);
}

// just to check
echo $dom->saveHTML();
?>

As you see in the fiddle, this is the current output of code above:

<div>
    <p>These line shall stay</p>

    <p>But keep this</p>

</div>

While this is desired result:

<div>
    <p>These line shall stay</p>
    Remove this one
    <p>But keep this</p>
    and this
</div>

How can I do that?

like image 326
Martin AJ Avatar asked Jan 05 '23 10:01

Martin AJ


1 Answers

Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.

Example:

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
    <div style="color: red">and <p>also</p> this</div>
    <div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//*[@*]") as $node) {
    $parent = $node->parentNode;
    while ($node->hasChildNodes()) {
        $parent->insertBefore($node->lastChild, $node->nextSibling);
    }
    $parent->removeChild($node);
}

echo $dom->saveHTML();

Outputs:

<div>
    <p>These line shall stay</p>
    Remove this one
    <p>But keep this</p>
    and this
    and <p>also</p> this
    and this too
</div>

https://3v4l.org/9qHRM

(I added some nested elements to demonstrate the safety of this approach.)


Couple of asides:

  • You don't need $dom->removeChild($dom->doctype) if you load with the additional LIBXML_HTML_NODEFDTD flag.
  • Your xpath expression can be simplified to //*[@*]
like image 109
user3942918 Avatar answered Jan 13 '23 10:01

user3942918