Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOM append HTML to existing document without DOMDocumentFragment::appendXML

I need to load some arbitrary HTML into an existing DOMDocument tree. Previous answers suggest using DOMDocumentFragment and its appendXML method to handle this.

As @Owlvark indicates in the comments, xml is not html and therefore this is not a good solution.

The main issue that I had with it was that entities like &ndash were causing errors because the appendXML method expects well formed XML.

We could define the entities, but this doesn't take care of the problem that not all html is valid xml.

What is a good solution for importing HTML into a DOMDocument tree?

like image 317
wmarbut Avatar asked Sep 11 '12 19:09

wmarbut


1 Answers

The solution that I came up with is to use DomDocument::loadHtml as @FrankFarmer suggests and then to take the parsed nodes and import them into my current document. My implementation looks like this

/**
* Parses HTML into DOMElements
* @param string $html the raw html to transform
* @param \DOMDocument $doc the document to import the nodes into
* @return array an array of DOMElements on success or an empty array on failure
*/
protected function htmlToDOM($html, $doc) {
     $html = '<div id="html-to-dom-input-wrapper">' . $html . '</div>';
     $hdoc = DOMDocument::loadHTML($html);
     $child_array = array();
     try {
         $children = $hdoc->getElementById('html-to-dom-input-wrapper')->childNodes;
         foreach($children as $child) {
             $child = $doc->importNode($child, true);
             array_push($child_array, $child);
         }
     } catch (Exception $ex) {
         error_log($ex->getMessage(), 0);
     }
     return $child_array;
 }
like image 63
wmarbut Avatar answered Oct 06 '22 19:10

wmarbut