Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOMDocument saveHTML breaks format

Tags:

dom

php

Why would this code:

$doc = new DOMDocument();
$doc->loadHTML($this->content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgNodes = $doc->getElementsByTagName('img');

if ($imgNodes->length > 0) {
    $inlineImage = new Image();
    $inlineImage->setPublicDir($publicDirPath);

    foreach ($imgNodes as $imgNode) {
        $inlineImage->setUri($imgNode->getAttribute('src'));
        $inlineImage->setName(basename($inlineImage->getUri()));

        if ($inlineImage->getUri() != $dstPath.$inlineImage->getName()) {
            $inlineImage->move($dstPath);

            $imgNode->setAttribute('src', $dstPath.'/'.$inlineImage->getName());                 
        }
    }

    $this->content = $doc->saveHtml();

}

executed on this code:

<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>

result in this code:

<p><img alt="fluid cat" src="/images/full/2016-09/fluid-cat.jpg"><p><img alt="pandas" src="/images/full/2016-09/pandas.jpg"></p></p>

Why does it place both img tags inside the first p block?

like image 972
EmilCataranciuc Avatar asked Dec 14 '22 03:12

EmilCataranciuc


1 Answers

Your html sample doesn't have a root element that surrounds all. When LIBXML parses the html to build the DOM tree, it assumes that the first encountered tag is the root element. Consequence, the first tag </p> is seen as an orphan closing tag (because there's content after it) and is automatically removed, and a </p> is added at the end to close the root element.

To avoid these automatic fixes when you are working with html parts (not a whole html document), you need to add a fake root element. At the end, to produce the result string, you need to save each childnode of this fake root element. Example:

$html = '<p><img alt="fluid cat" src="/images/tmp/fluid-cat.jpg"></p><p><img alt="pandas" src="/images/tmp/pandas.jpg"></p>';

$doc = new DOMDocument;
$doc->loadHTML( '<div>' . $html . '</div>', LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
#               ^-----------------^----- fake root element
$root = $doc->documentElement;

$result = '';

foreach($root->childNodes as $childNode) {
    $result .= $doc->saveHTML($childNode);
}

echo $result;
like image 161
Casimir et Hippolyte Avatar answered Dec 16 '22 18:12

Casimir et Hippolyte