I am processing an XML compliant input of XHTML using XPATH in PHP like this:
$xml=new DOMDocument();
$xml->loadXML(utf8_encode($temp));
[...]
$temp=utf8_decode($xml->saveXML());
The problem that arises is that nodes that may not be self closing according to the HTML5 specs, e.g.
<textarea id="something"></textarea>
or a div to leverage by JS
<div id="someDiv" class="whaever"></div>
come back out as
<textarea id="something" />
and
<div id="someDiv" class="whaever" />
I currently address this by using str_replace
, but that's nonsese as I need to match individual cases. How can I solve this?
At the same time XPATH insists on putting out
xmlns:default="http://www.w3.org/1999/xhtml
and on individual nodes freshly created, it puts stuff like <default:p>
. How do I stop that without resorting to stupid search and replace like this:
$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml" '," ",$temp);
$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml"'," ",$temp);
$temp=str_replace('<default:',"<",$temp);
$temp=str_replace('</default:',"</",$temp);
?
EDIT: I'm really getting trouble with the stupid search and replace and I do not intend to attack the output XHTML with RegExp. Consider this example:
<div id="videoPlayer0" class="videoPlayerPlacement" data-xml="video/cp_IV_a_1.xml"/>
Obviously self-closing divs are illegal (at least in one context where I cannot output as mime application/xhtml+xml but am forced to use mime text/html) and in all other cases they sure don't validate.
Sorry for the late reply, but you know... it was Christmas. :D
function export_html(DOMDocument $dom)
{
$voids = ['area',
'base',
'br',
'col',
'colgroup',
'command',
'embed',
'hr',
'img',
'input',
'keygen',
'link',
'meta',
'param',
'source',
'track',
'wbr'];
// Every empty node. There is no reason to match nodes with content inside.
$query = '//*[not(node())]';
$nodes = (new DOMXPath($dom))->query($query);
foreach ($nodes as $n) {
if (! in_array($n->nodeName, $voids)) {
// If it is not a void/empty tag,
// we need to leave the tag open.
$n->appendChild(new DOMComment('NOT_VOID'));
}
}
// Let's remove the placeholder.
return str_replace('<!--NOT_VOID-->', '', $dom->saveXML());
}
In your example
$dom = new DOMDocument();
$dom->loadXML(<<<XML
<html>
<textarea id="something"></textarea>
<div id="someDiv" class="whaever"></div>
</html>
XML
);
echo export_html($dom);
will produce
<?xml version="1.0"?>
<html>
<textarea id="something"></textarea>
<div id="someDiv" class="whaever"></div>
</html>
Merry Christmas! ^_^
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With