Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

illegal self closing node notation for empty nodes - outputting XHTML with PHP DOMDocument

I am processing an XML compliant input of XHTML using XPATH in PHP like this:

$xml=new DOMDocument();
$xml->loadXML(utf8_encode($temp));
[...]
$temp=utf8_decode($xml->saveXML());

The problem that arises is that nodes that may not be self closing according to the HTML5 specs, e.g.

<textarea id="something"></textarea>

or a div to leverage by JS

<div id="someDiv" class="whaever"></div>

come back out as

<textarea id="something" />

and

<div id="someDiv" class="whaever" />

I currently address this by using str_replace, but that's nonsese as I need to match individual cases. How can I solve this?

At the same time XPATH insists on putting out

xmlns:default="http://www.w3.org/1999/xhtml

and on individual nodes freshly created, it puts stuff like <default:p>. How do I stop that without resorting to stupid search and replace like this:

$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml" '," ",$temp);
$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml"'," ",$temp);
$temp=str_replace('<default:',"<",$temp);
$temp=str_replace('</default:',"</",$temp);

?

EDIT: I'm really getting trouble with the stupid search and replace and I do not intend to attack the output XHTML with RegExp. Consider this example:

<div id="videoPlayer0" class="videoPlayerPlacement" data-xml="video/cp_IV_a_1.xml"/>

Obviously self-closing divs are illegal (at least in one context where I cannot output as mime application/xhtml+xml but am forced to use mime text/html) and in all other cases they sure don't validate.

like image 515
C.O. Avatar asked Dec 02 '15 03:12

C.O.


1 Answers

Sorry for the late reply, but you know... it was Christmas. :D

function export_html(DOMDocument $dom)
{
        $voids = ['area',
                  'base',
                  'br',
                  'col',
                  'colgroup',
                  'command',
                  'embed',
                  'hr',
                  'img',
                  'input',
                  'keygen',
                  'link',
                  'meta',
                  'param',
                  'source',
                  'track',
                  'wbr'];

        // Every empty node. There is no reason to match nodes with content inside.
        $query = '//*[not(node())]';
        $nodes = (new DOMXPath($dom))->query($query);

        foreach ($nodes as $n) {
                if (! in_array($n->nodeName, $voids)) {
                        // If it is not a void/empty tag,
                        // we need to leave the tag open.
                        $n->appendChild(new DOMComment('NOT_VOID'));
                }
        }

        // Let's remove the placeholder.
        return str_replace('<!--NOT_VOID-->', '', $dom->saveXML());
}

In your example

$dom = new DOMDocument();
$dom->loadXML(<<<XML
<html>
        <textarea id="something"></textarea>
        <div id="someDiv" class="whaever"></div>
</html>
XML
);

echo export_html($dom); will produce

<?xml version="1.0"?>
<html>
    <textarea id="something"></textarea>
    <div id="someDiv" class="whaever"></div>
</html>

Merry Christmas! ^_^

like image 66
Daniele Orlando Avatar answered Oct 31 '22 00:10

Daniele Orlando