I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this.
Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is:
./xpeg "//MainType[@ID=123]"
What seems most strange is this line, without which my approach doesn’t work:
$result->loadXML($result->saveXML($result));
As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn’t be necessary.
Is there a better way to perform xpath queries on this XML in PHP?
XML (note the binding of the default namespace):
<?xml version="1.0" encoding="utf-8"?>
<MyRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
xmlns="http://www.example.com/data">
<MainType ID="192" comment="Bob's site">
<Price>$0.20</Price>
<TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="123" comment="Test site">
<Price>$99.95</Price>
<TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="922" comment="Health Insurance">
<Price>$600.00</Price>
<TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
<MainType ID="389" comment="Used Cars">
<Price>$5000.00</Price>
<TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
<Validated>N</Validated>
</MainType>
</MyRoot>
PHP CLI Script:
#!/usr/bin/php-cli
<?php
$xml = file_get_contents("xpeg.xml");
$domdoc = new DOMDocument();
$domdoc->loadXML($xml);
// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");
// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));
$xpath = new DOMXpath($domdoc);
$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
dump_dom_levels($result);
}
else {
echo "error\n";
}
// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
$class = get_class($node);
if ($class == "DOMNodeList") {
echo "Level $level ($class): $node->length items\n";
foreach ($node as $child_node) {
dump_dom_levels($child_node, $level+1);
}
}
else {
$nChildren = 0;
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
$nChildren++;
}
}
if ($nChildren) {
echo "Level $level ($class): $nChildren children\n";
}
foreach ($node->childNodes as $child_node) {
if ($child_node->hasChildNodes()) {
dump_dom_levels($child_node, $level+1);
}
}
}
}
?>
The solution is using the namespace, not getting rid of it.
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("x", trim($argv[2]));
$str = trim($argv[1]);
$result = $xpath->query($str);
And call it as this on the command line (note the x:
in the XPath expression)
./xpeg "//x:MainType[@ID=123]" "http://www.example.com/data"
You can make this more shiny by
$xpath->query()
xyz=http//namespace.uri/
to create custom namespace prefixesBottom line is: In XPath you can't query //foo
when you really mean //namespace:foo
. These are fundamentally different and therefore select different nodes. The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With