This sounds like a pretty easy question to answer but I haven't been able to get it to work. I'm running PHP 5.2.6.
I have a DOM element (the root element) which, when I go to $element->saveXML(), it outputs an xmlns attribute:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
...
However, I cannot find any way programmatically within PHP to see that namespace. I want to be able to check whether it exists and what it's set to.
Checking $document->documentElement->namespaceURI
would be the obvious answer but that is empty (I've never actually been able to get that to be non-empty). What is generating that xmlns value in the output and how can I read it?
The only practical way I've been able to do this so far is a complete hack - by saving it as XML to a string using saveXML() then reading through that using regular expressions.
Edit:
This may be a peculiarity of loading XML in using loadHTML() rather than loadXML() and then printing it out using saveXML(). When you do that, it appears that for some reason saveXML adds an xmlns attribute even though there is no way to detect that this xmlns value is part of the document using DOM methods. Which I guess means that if I had a way of detecting whether the document passed in had been loaded in using loadHTML() then I could solve this a different way.
Like edorian already showed, getting the namespace works fine when the Markup is loaded with loadXML
. But you are right that this wont work for Markup loaded with loadHTML
:
$html = <<< XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:m="foo" lang="en">
<body xmlns="foo">Bar</body>
</html>
XML;
$dom = new DOMDocument;
$dom->loadHTML($html);
var_dump($dom->documentElement->getAttribute("xmlns"));
var_dump($dom->documentElement->lookupNamespaceURI(NULL));
var_dump($dom->documentElement->namespaceURI);
will produce empty results. But you can use XPath
$xp = new DOMXPath($dom);
echo $xp->evaluate('string(@xmlns)');
// http://www.w3.org/1999/xhtml;
and for body
echo $xp->evaluate('string(body/@xmlns)'); // foo
or with context node
$body = $dom->documentElement->childNodes->item(0);
echo $xp->evaluate('string(@xmlns)', $body);
// foo
My uneducated assumption is that internally, a HTML Document is different from a real Document. Internally libxml uses a different module to parse HTML and the DOMDocument itself will be of a different nodeType, as you can simply verify by doing
var_dump($dom->nodeType); // 13 with loadHTML, 9 with loadXml
with 13 being a XML_HTML_DOCUMENT_NODE
.
With PHP 5.2.6 i've found 2 ways to this:
<?php
$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?'.
'><html xmlns="http://www.w3.org/1999/xhtml" lang="en"></html>';
$x = DomDocument::loadXml($xml);
var_dump($x->documentElement->getAttribute("xmlns"));
var_dump($x->documentElement->lookupNamespaceURI(NULL));
prints
string(28) "http://www.w3.org/1999/xhtml"
string(28) "http://www.w3.org/1999/xhtml"
Hope thats what you asked for :)
Well, you can do so with a function like this:
function getNamespaces(DomNode $node, $recurse = false) {
$namespaces = array();
if ($node->namespaceURI) {
$namespaces[] = $node->namespaceURI;
}
if ($node instanceof DomElement && $node->hasAttribute('xmlns')) {
$namespaces[] = $xmlns = $node->getAttribute('xmlns');
foreach ($node->attributes as $attr) {
if ($attr->namespaceURI == $xmlns) {
$namespaces[] = $attr->value;
}
}
}
if ($recurse && $node instanceof DomElement) {
foreach ($node->childNodes as $child) {
$namespaces = array_merge($namespaces, getNamespaces($child, vtrue));
}
}
return array_unique($namespaces);
}
So, you feed it a DomEelement, and then it finds all related namespaces:
$xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en"
xmlns:foo="http://example.com/bar">
<body>
<h1>foo</h1>
<foo:h2>bar</foo:h2>
</body>
</html>';
var_dump(getNamespaces($dom->documentElement, true));
Prints out:
array(2) {
[0]=>
string(28) "http://www.w3.org/1999/xhtml"
[3]=>
string(22) "http://example.com/bar"
}
Note that DomDocument will automatically strip out all unused namespaces...
As for why $dom->documentElement->namespaceURI
is always null
, it's because the document element doesn't have a namespace. The xmlns
attribute provides a default namespace for the document, but it doesn't endow the html
tag with a namespace (for purposes of DOM interaction). You can try doing a $dom->documentElement->removeAttribute('xmlns')
, but I'm not 100% sure if it will work...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With