Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP XPath search returning 0 results

Tags:

php

xml

xpath

Below I have a PHP script that I need to search through an XML file and find the ID for <AnotherChild>. For some reason, at the moment it returns 0 results and I can't figure out why. If anyone can see why it's returning 0 results I'd really appreciate it if they could let me know why.

XML:

<TransXChange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.transxchange.org.uk/" xsi:schemaLocation="http://www.transxchange.org.uk/ http://www.transxchange.org.uk/schema/2.1/TransXChange_general.xsd" CreationDateTime="2013-07-12T18:12:21.8122032+01:00" ModificationDateTime="2013-07-12T18:12:21.8122032+01:00" Modification="new" RevisionNumber="3" FileName="swe_44-611A-1-y10.xml" SchemaVersion="2.1">
    <Node1>...</Node1>
    <Node2>...</Node2>
    <Node3>...</Node3>
    <Node4>...</Node4>
    <Node5>...</Node5>
    <Node6>...</Node6>
    <Node7>
        <Child>
            <id>ABCDEFG123</id>
        </Child>
        <AnotherChild>
            <id>ABCDEFG124</id>
        </AnotherChild>
    </Node7>
    <Node8>...</Node8>
</TransXChange>

PHP:

<?php

  $xmldoc = new DOMDocument();
  $xmldoc->load("directory1/directory2/file.xml");

  $xpathvar = new DOMXPath($xmldoc);
  $xpathvar->registerNamespace('transXchange', 'http://www.transxchange.org.uk/');

  $queryResult = $xpathvar->query('//AnotherChild/id');
  foreach($queryResult as $result) {
    echo $result->textContent;
  }
?>

Thanks

like image 220
jskidd3 Avatar asked Aug 08 '13 21:08

jskidd3


2 Answers

The two questions linked in comments do actually answer this question, but they don't quite make it clear enough why they answer it IMO, so I'll add this following my answer in chat.


Consider the following XML document:

<root>
  <child>
    <grandchild>foo</grandchild>
  </child>
</root>

This has no xmlns attributes at all, which means you can query //grandchild and get the result you expect. Every node is in the default namespace, so everything can be addressed without registering a namespace in XPath.

Now consider this:

<root xmlns="http://www.bar.com/">
  <child>
    <grandchild>foo</grandchild>
  </child>
</root>

This declares a namespace of http://www.bar.com/ and as a result you must use that namespace to address a member node.

As you have already figured out, the way to do this is to use DOMXPath::registerNamespace() - but the crucial point that you missed is that (in PHP's XPath implementation) every namespace must be registered with a prefix, and you must use that prefix to address nodes that belong to it. It is not possible register a namespace in XPath with an empty prefix.

So, given the second example above, lets look at how we would execute the original //grandchild query:

<?php

    $doc = new DOMDocument();
    $doc->loadXML($xml);

    $xpath = new DOMXPath($doc);
    $xpath->registerNamespace('bar', 'http://www.bar.com/');

    $nodes = $xpath->query('//bar:grandchild');
    foreach($nodes as $node) {
        // do stuff with $node
    }

Note how we registered the namespace using it's URI, and we specified a prefix. Even though the original XML did not contain this prefix, we use the prefix in the query - example.

To understand why, lets look at another piece of XML:

<baz:root xmlns:baz="http://www.bar.com/">
  <baz:child>
    <baz:grandchild>foo</baz:grandchild>
  </baz:child>
</baz:root>

This document is semantically identical to the second - the code sample would work equally well with either (proof). The prefix is separate from the namespace. Note that even though this uses a baz: prefix in the document, the XPath uses the bar: prefix. This is because the think that identifies the namespace is the URI, not the prefix.

So when a document uses a namespace, we must work with the namespace, not against it, by registering the namespace in XPath and using the prefix we registered it against to refer to any nodes that belong to that namespace.

For completeness, when we apply these principles to your original document, the query that you would use with the code in the question is:

//transXchange:AnotherChild/transXchange:id
like image 165
DaveRandom Avatar answered Sep 21 '22 19:09

DaveRandom


To fix this problem I first registered the namespace:

$xpathvar->registerNamespace('transXchange', 'http://www.transxchange.org.uk/');

And then modified the query like so:

$queryResult = $xpathvar->query('//transXchange:AnotherChild/transXchange:id');

This returned the ID successfully.

like image 2
jskidd3 Avatar answered Sep 23 '22 19:09

jskidd3