Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DomDocument removeChild in foreach reindexing the dom

I am trying to delete p tags with data-spotid attribute

        $dom = new DOMDocument();
        @$dom->loadHTML($description);
        $pTag = $dom->getElementsByTagName('p');

        foreach ($pTag as $value) {
            /** @var DOMElement $value */
            $id = $value->getAttribute('data-spotid');
            if ($id) {
                $value->parentNode->removeChild($value);
            }
        }

but when i am removing child it is reindexing the dom. let suppose i have 8 items i deleted 1st it will reindex it and 2nd element will become 1st and it will not delete it will go to 2nd which is now 3rd element.

like image 289
rohitarora Avatar asked Apr 28 '16 09:04

rohitarora


3 Answers

This is mentioned in a couple of comments on the DomNode::removeChild documentation, with the issue apparently being how the iterator pointer on the foreach not being able to deal with the fact that you are removing items from a parent array while looping through the list of children (or something).

The recommended fix is to loop through the main node first and push the child nodes you want to delete to its own array, then loop through that "to-be-deleted" array and deleting those children from their parent. Example:

$dom = new DOMDocument();
@$dom->loadHTML($description);
$pTag = $dom->getElementsByTagName('p');

$spotid_children = array();

foreach ($pTag as $value) {
    /** @var DOMElement $value */
    $id = $value->getAttribute('data-spotid');
    if ($id) {
        $spotid_children[] = $value; 
    }
}

foreach ($spotid_children as $spotid_child) {
    $spotid_child->parentNode->removeChild($spotid_child); 
}
like image 108
Anthony Avatar answered Oct 18 '22 06:10

Anthony


We can use like this:

        $dom = new DOMDocument();
        @$dom->loadHTML($description);
        $pTag = $dom->getElementsByTagName('p');
        $count = count($pTag)
        for($i = 0; $i < $count; $i++) {
            /** @var DOMElement $value */
            $value = $pTag[$i];
            $id = $value->getAttribute('data-spotid');
            if ($id) {
                $i--;$count--;
                $value->parentNode->removeChild($value);
            }
        }
like image 37
Aju John Avatar answered Oct 18 '22 05:10

Aju John


Like I commented, the easy solution would be to just cast the iterator to an array. E.g.:

$elements = iterator_to_array($elements);

But, if we're talking about performance, a better way would be to simply select only the required nodes. Neat side-effect, the removal-problem also goes away.

E.g.:

<?php
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML(<<<__XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>1</element>
    <element attr="a">2</element>
    <element>3</element>
    <element>4</element>
    <element attr="a">5</element>
    <element attr="a">6</element>
    <element>7</element>
    <element>8</element>
</root>
__XML
);

$xpath = new DOMXPath($doc);
$elements = $xpath->query('//element[@attr]');

foreach ($elements as $element) {
    $element->parentNode->removeChild($element);
}

echo $doc->saveXML();

Demo: https://3v4l.org/CM9Fv

like image 2
Yoshi Avatar answered Oct 18 '22 06:10

Yoshi