I'm manipulating a short HTML snippet with XPath; when I output the changed snippet back with $doc->saveHTML(), <code>DOCTYPE</code> gets added, and <code>HTML / BODY</code> tags wrap the output. I want to remove those, but keep all the children inside by only using the DOMDocument functions. For example: <pre class="prettyprint"><code>$doc = new DOMDocument(); $doc->loadHTML('Title... <a href="http://www....."><img src="http://" alt=""></a> ...to be one of those crowning achievements...'); // manipulation goes here echo htmlentities( $doc->saveHTML() ); </code></pre> This produces: <pre class="prettyprint"><code><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...> <html><body> Title... <a href="http://www....."><img src="http://" alt=""></a> ...to be one of those crowning achievements... </body></html> </code></pre> I've attempted some of the simple tricks, such as: <pre class="prettyprint"><code># removes doctype $doc->removeChild($doc->firstChild); # <body> replaces <html> $doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild); </code></pre> So far that only removes DOCTYPE and replaces HTML with BODY. However, what remains is body > variable number of elements at this point. How do I remove the <code><body></code> tag but keep all of its children, given that they will be structured variably, in a neat - clean way with PHP's DOM manipulation?

Here how I've done it: -- Quick helper function that gives you HTML contents for specific DOM element <pre class="prettyprint"> function nodeContent($n, $outer=false) { $d = new DOMDocument('1.0'); $b = $d->importNode($n->cloneNode(true),true); $d->appendChild($b); $h = $d->saveHTML(); // remove outter tags if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4)); return $h; } </pre> -- Find body node in your doc and get its contents <pre class="prettyprint"> $query = $xpath->query("//body")->item(0); if($query) { echo nodeContent($query); } </pre> UPDATE 1: Some extra info: Since PHP/5.3.6, DOMDocument->saveHTML() accepts an optional DOMNode parameter similarly to DOMDocument->saveXML(). You can do <pre class="prettyprint"> $xpath = new DOMXPath($doc); $query = $xpath->query("//body")->item(0); echo $doc->saveHTML($query); </pre> for others, the helper function will help

Remove parent element, keep all inner children in DOMDocument with saveHTML

Tags:

php

xpath

domdocument

I'm manipulating a short HTML snippet with XPath; when I output the changed snippet back with $doc->saveHTML(), DOCTYPE gets added, and HTML / BODY tags wrap the output. I want to remove those, but keep all the children inside by only using the DOMDocument functions. For example:

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
echo htmlentities( $doc->saveHTML() );

This produces:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...>
<html><body>
<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>
</body></html>

I've attempted some of the simple tricks, such as:

# removes doctype
$doc->removeChild($doc->firstChild);

# <body> replaces <html>
$doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild);

So far that only removes DOCTYPE and replaces HTML with BODY. However, what remains is body > variable number of elements at this point.

How do I remove the <body> tag but keep all of its children, given that they will be structured variably, in a neat - clean way with PHP's DOM manipulation?

268

asked May 02 '12 15:05

pp19dd

3 Answers

UPDATE

Here's a version that doesn't extend DOMDocument, though I think extending is the proper approach, since you're trying to achieve functionality that isn't built-in to the DOM API.

Note: I'm interpreting "clean" and "without workarounds" as keeping all manipulation to the DOM API. As soon as you hit string manipulation, that's workaround territory.

What I'm doing, just as in the original answer, is leveraging DOMDocumentFragment to manipulate multiple nodes all sitting at the root level. There is no string manipulation going on, which to me qualifies as not being a workaround.

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild($doc->doctype);

// Remove html element, preserving child nodes
$html = $doc->getElementsByTagName("html")->item(0);
$fragment = $doc->createDocumentFragment();
while ($html->childNodes->length > 0) {
    $fragment->appendChild($html->childNodes->item(0));
}
$html->parentNode->replaceChild($fragment, $html);

// Remove body element, preserving child nodes
$body = $doc->getElementsByTagName("body")->item(0);
$fragment = $doc->createDocumentFragment();
while ($body->childNodes->length > 0) {
    $fragment->appendChild($body->childNodes->item(0));
}
$body->parentNode->replaceChild($fragment, $body);

// Output results
echo htmlentities($doc->saveHTML());

ORIGINAL ANSWER

This solution is rather lengthy, but it's because it goes about it by extending the DOM in order to keep your end code as short as possible.

sliceOutNode is where the magic happens. Let me know if you have any questions:

<?php

class DOMDocumentExtended extends DOMDocument
{
    public function __construct( $version = "1.0", $encoding = "UTF-8" )
    {
        parent::__construct( $version, $encoding );

        $this->registerNodeClass( "DOMElement", "DOMElementExtended" );
    }

    // This method will need to be removed once PHP supports LIBXML_NOXMLDECL
    public function saveXML( DOMNode $node = NULL, $options = 0 )
    {
        $xml = parent::saveXML( $node, $options );

        if( $options & LIBXML_NOXMLDECL )
        {
            $xml = $this->stripXMLDeclaration( $xml );
        }

        return $xml;
    }

    public function stripXMLDeclaration( $xml )
    {
        return preg_replace( "|<\?xml(.+?)\?>[\n\r]?|i", "", $xml );
    }
}

class DOMElementExtended extends DOMElement
{
    public function sliceOutNode()
    {
        $nodeList = new DOMNodeListExtended( $this->childNodes );
        $this->replaceNodeWithNode( $nodeList->toFragment( $this->ownerDocument ) );
    }

    public function replaceNodeWithNode( DOMNode $node )
    {
        return $this->parentNode->replaceChild( $node, $this );
    }
}

class DOMNodeListExtended extends ArrayObject
{
    public function __construct( $mixedNodeList )
    {
        parent::__construct( array() );

        $this->setNodeList( $mixedNodeList );
    }

    private function setNodeList( $mixedNodeList )
    {
        if( $mixedNodeList instanceof DOMNodeList )
        {
            $this->exchangeArray( array() );

            foreach( $mixedNodeList as $node )
            {
                $this->append( $node );
            }
        }
        elseif( is_array( $mixedNodeList ) )
        {
            $this->exchangeArray( $mixedNodeList );
        }
        else
        {
            throw new DOMException( "DOMNodeListExtended only supports a DOMNodeList or array as its constructor parameter." );
        }
    }

    public function toFragment( DOMDocument $contextDocument )
    {
        $fragment = $contextDocument->createDocumentFragment();

        foreach( $this as $node )
        {
            $fragment->appendChild( $contextDocument->importNode( $node, true ) );
        }

        return $fragment;
    }

    // Built-in methods of the original DOMNodeList

    public function item( $index )
    {
        return $this->offsetGet( $index );
    }

    public function __get( $name )
    {
        switch( $name )
        {
            case "length":
                return $this->count();
            break;
        }

        return false;
    }
}

// Load HTML/XML using our fancy DOMDocumentExtended class
$doc = new DOMDocumentExtended();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild( $doc->doctype );

// Slice out html node
$html = $doc->getElementsByTagName("html")->item(0);
$html->sliceOutNode();

// Slice out body node
$body = $doc->getElementsByTagName("body")->item(0);
$body->sliceOutNode();

// Pick your poison: XML or HTML output
echo htmlentities( $doc->saveXML( NULL, LIBXML_NOXMLDECL ) );
echo htmlentities( $doc->saveHTML() );

115

answered Nov 10 '22 00:11

matb33

saveHTML can output a subset of document, meaning we can ask it to output every child node one by one, by traversing body.

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<a href="http://google.com"><img src="http://google.com/img.jpeg" alt=""></a>
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here

// Let's traverse the body and output every child node
$bodyNode = $doc->getElementsByTagName('body')->item(0);
foreach ($bodyNode->childNodes as $childNode) {
  echo $doc->saveHTML($childNode);
}

This might not be a most elegant solution, but it works. Alternatively, we can wrap all children nodes inside some container element (say a div) and output only that container (but container tag will be included in the output).

answered Nov 10 '22 00:11

galymzhan

Here how I've done it:

-- Quick helper function that gives you HTML contents for specific DOM element

function nodeContent($n, $outer=false) {
   $d = new DOMDocument('1.0');
   $b = $d->importNode($n->cloneNode(true),true);
   $d->appendChild($b); $h = $d->saveHTML();
   // remove outter tags
   if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
   return $h;
}

-- Find body node in your doc and get its contents

$query = $xpath->query("//body")->item(0);
if($query)
{
    echo nodeContent($query);
}

UPDATE 1:

Some extra info: Since PHP/5.3.6, DOMDocument->saveHTML() accepts an optional DOMNode parameter similarly to DOMDocument->saveXML(). You can do

$xpath = new DOMXPath($doc);
$query = $xpath->query("//body")->item(0);
echo $doc->saveHTML($query);

for others, the helper function will help

answered Nov 09 '22 22:11

Alexey Gerasimov

Related questions
                            
                                PHP Payment Library [closed]
                            
                                Checking if a key is the last element in an array?
                            
                                How to convert yyyy-MM-dd HH:mm:ss to "15th Apr 2010" using PHP
                            
                                password sent via post secure? [duplicate]
                            
                                Help to understand magic_quotes_gpc()
                            
                                How can I check PHP version if phpinfo() is disabled?
                            
                                Using Zend Framework for highload projects
                            
                                Find multiple string positions in PHP
                            
                                HTTP Error 302 using uploadify
                            
                                PHP function returning boolean
                            
                                PHP does unlink function works with a path?
                            
                                php run once and insert twice in mysql database
                            
                                Select from position to end of line in string
                            
                                Problem with WordPress "save_post" Action
                            
                                Naming convention uploaded files [closed]
                            
                                placeholder text in zend text element
                            
                                PHP Class throws an error, what is wrong
                            
                                whats the difference between ob_flush and ob_end_flush?
                            
                                magento $order->getAllItems() return twice the same item
                            
                                How can I get the name of the image from url?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With