I need to load some arbitrary HTML into an existing DOMDocument
tree. Previous answers suggest using DOMDocumentFragment
and its appendXML
method to handle this.
As @Owlvark indicates in the comments, xml is not html and therefore this is not a good solution.
The main issue that I had with it was that entities like &ndash
were causing errors because the appendXML
method expects well formed XML.
We could define the entities, but this doesn't take care of the problem that not all html is valid xml.
What is a good solution for importing HTML into a DOMDocument
tree?
The solution that I came up with is to use DomDocument::loadHtml
as @FrankFarmer suggests and then to take the parsed nodes and import them into my current document. My implementation looks like this
/**
* Parses HTML into DOMElements
* @param string $html the raw html to transform
* @param \DOMDocument $doc the document to import the nodes into
* @return array an array of DOMElements on success or an empty array on failure
*/
protected function htmlToDOM($html, $doc) {
$html = '<div id="html-to-dom-input-wrapper">' . $html . '</div>';
$hdoc = DOMDocument::loadHTML($html);
$child_array = array();
try {
$children = $hdoc->getElementById('html-to-dom-input-wrapper')->childNodes;
foreach($children as $child) {
$child = $doc->importNode($child, true);
array_push($child_array, $child);
}
} catch (Exception $ex) {
error_log($ex->getMessage(), 0);
}
return $child_array;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With