Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP DOMDocument replace DOMElement child with HTML string

Using PHP I'm attempting to take an HTML string passed from a WYSIWYG editor and replace the children of an element inside of a preloaded HTML document with the new HTML.

So far I'm loading the document identifying the element I want to change by ID but the process to convert an HTML to something that can be placed inside a DOMElement is eluding me.

libxml_use_internal_errors(true);

$doc = new DOMDocument();
$doc->loadHTML($html);

$element = $doc->getElementById($item_id);
if(isset($element)){
    //Remove the old children from the element
    while($element->childNodes->length){
        $element->removeChild($element->firstChild);
    }

    //Need to build the new children from $html_string and append to $element
}
like image 566
Andrew Winter Avatar asked Feb 10 '10 00:02

Andrew Winter


2 Answers

If the HTML string can be parsed as XML, you can do this (after clearing the element of all child nodes):

$fragment = $doc->createDocumentFragment();
$fragment->appendXML($html_string);
$element->appendChild($fragment);

If $html_string cannot be parsed as XML, it will fail. If it does, you’ll have to use loadHTML(), which is less strict — but it will add elements around the fragment which you will have to strip.

Unlike PHP, Javascript has the innerHTML property which allows you to do this very easily. I needed something like it for a project so I extended PHP’s DOMElement to include Javascript-like innerHTML access.

With it you can access the innerHTML property and change it just as you would in Javascript:

echo $element->innerHTML;
$elem->innerHTML = '<a href="http://example.org">example</a>';

Source: http://www.keyvan.net/2012/11/php-domdocument-replace-domelement-child-with-html-string/

like image 188
Keyvan Avatar answered Sep 18 '22 11:09

Keyvan


The current accepted answer suggests using appendXML(), but acknowledges that it won't handle complex html such as what is returned from a WYSISYG editor as specified in the original question. As suggested loadHTML() can address this. but no one has yet shown how.

This is what I believe is the best/correct answer to the original question addressing encoding issues, "Document Fragment is empty" warnings and "Wrong Document Error" errors that someone is likely to hit if they write this from scratch. I know I found them after following the hints in the previous responses.

This is code from a site I support that inserts WordPress sidebar content into the $content of a post. It assumes that $doc is a valid DOMDocument similar to the way $doc is defined in the original question. It also assumes that $element is the tag after which you wish to insert the sidebarcontent (or whatever).

            // NOTE: Cannot use a document fragment here as the AMP html is too complex for the appendXML function to accept.
            // Instead create it as a document element and insert that way.
            $node = new DOMDocument();
            // Note that we must encode it correctly or strange characters may appear.
            $node->loadHTML( mb_convert_encoding( $sidebarContent, 'HTML-ENTITIES', 'UTF-8') );
            // Now we need to move this document element into the scope of the content document 
            // created above or the insert/append will be rejected.
            $node = $doc->importNode( $node->documentElement, true );
            // If there is a next sibling, insert before it.
            // If not, just add it at the end of the element we did find.
            if (  $element->nextSibling ) {
                $element->parentNode->insertBefore( $node, $element->nextSibling );
            } else {
                $element->parentNode->appendChild($node);
            }

After all of this is done, if you don't want to have the source of a full HTML document with body tags and what not, you can generate the more localized html with this:

    // Now because we have moved the post content into a full document, we need to get rid of the 
    // extra elements that make it a document and not a fragment
    $body = $doc->getElementsByTagName( 'body' );
    $body = $body->item(0);

    // If you need an element with a body tag, you can do this.
    // return $doc->savehtml( $body );

    // Extract the html from the body tag piece by piece to ensure valid html syntax in destination document
    $bodyContent = ''; 
    foreach( $body->childNodes as $node ) { 
            $bodyContent .= $body->ownerDocument->saveHTML( $node ); 
    } 
    // Now return the full content with the new content added. 
    return $bodyContent;
like image 24
Brian Layman Avatar answered Sep 16 '22 11:09

Brian Layman