Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOM Parser Moving Closing Div Tag

here's my code:

$myHtml = '
<div class="div-class">
    <p>text</p>

    <p><a href="#">text</a></p>
</div>

<ul class="some-class">
    <li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src=""></a>
    </li>
</ul>
';

$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
    $list->removeChild($a->parentNode);
}

var_dump($doc->saveHTML());

Essentially, I am trying to remove a list item that contains an anchor tag with a title of 'something something'. However, when I save the html after applying the changes, the list moves inside the div tag. Why would that occur? Thanks.

like image 372
user8709724 Avatar asked Oct 16 '22 22:10

user8709724


1 Answers

loadHTML() tries to correct syntaxis, and it doesn't like the ul element being parentless, so it moves it inside the div. If you wrap it all around a body tag, it'll work correctly.

loadHTML() actually automatically should do the wrapping for you were it necessary, but you set the LIBXML_HTML_NOIMPLIED flag, which disables this.

<?php
$myHtml = '
<html>
<body>
<div class="div-class">
    <p>text</p>

    <p><a href="#">text</a></p>
</div>

<ul class="some-class">
    <li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src=""></a>
    </li>
</ul>
</body>
</html>
';

$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
    $list->removeChild($a->parentNode);
}

var_dump($doc->saveHTML());

Demo

Or, without the LIBXML_HTML_NOIMPLIED flag:

<?php
$myHtml = '
<div class="div-class">
    <p>text</p>

    <p><a href="#">text</a></p>
</div>

<ul class="some-class">
    <li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src="" alt=""></a>
    </li>
    <li><a href="" target="_blank" title=""><img src=""></a>
    </li>
</ul>
';

$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NODEFDTD);
var_dump (libxml_get_errors());
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
    $list->removeChild($a->parentNode);
}

var_dump($doc->saveHTML());

Demo

like image 196
ishegg Avatar answered Oct 27 '22 05:10

ishegg