Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath in SimpleXML for default namespaces without needing prefixes

I have an XML document that has a default namespace attached to it, eg

<foo xmlns="http://www.example.com/ns/1.0">
...
</foo>

In reality this is a complex XML document that conforms to a complex schema. My job is to parse out some data from it. To aid me, I have a spreadsheet of XPath. The XPath is rather deeply nested, eg

level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2]

The person who generate the XPath is an expert in the schema, so I am going with the assumption that I can't simplify it, or use object traversal shortcuts.

I am using SimpleXML to parse everything out. My problem has to do with how the default namespace gets handled.

Since there is a default namespace on the root element, I can't just do

$xml = simplexml_load_file($somepath);
$node = $xml->xpath('level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2]');

I have to register the namespace, assign it to a prefix, and then use the prefix in my XPath, eg

$xml = simplexml_load_file($somepath);
$xml->registerXPathNamespace('myns', 'http://www.example.com/ns/1.0');
$node = $xml->xpath('myns:level1/myns:level2/myns:level3[@foo="bar"]/myns:level4[@foo="bar"]/myns:level5/myns:level6[2]');

Adding the prefixes isn't going to be manageable in the long run.

Is there a proper way to handle default namespaces without needing to using prefixes with XPath?

Using an empty prefix doesn't work ($xml->registerXPathNamespace('', 'http://www.example.com/ns/1.0');). I can string out the default namespace, eg

$xml = file_get_contents($somepath);
$xml = str_replace('xmlns="http://www.example.com/ns/1.0"', '', $xml);
$xml = simplexml_load_string($xml);

but that is skirting the issue.

like image 421
mpdonadio Avatar asked Jan 15 '14 17:01

mpdonadio


People also ask

What is XPath default namespace?

The Default NamespaceXPath treats the empty prefix as the null namespace. In other words, only prefixes mapped to namespaces can be used in XPath queries. This means that if you want to query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.

How do I change the default namespace in XML?

You can change the default namespace within a particular element by adding an xmlns attribute to the element. Example 4-4 is an XML document that initially sets the default namespace to http://www.w3.org/1999/xhtml for all the XHTML elements. This namespace declaration applies within most of the document.

Which XML attribute is used to define namespace?

XML Namespaces - The xmlns Attribute When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax.


1 Answers

From a bit of reading online, this is not restricted to any particular PHP or other library, but to XPath itself - at least in XPath version 1.0

XPath 1.0 does not include any concept of a "default" namespace, so regardless of how the element names appear in the XML source, if they have a namespace bound to them, the selectors for them must be prefixed in basic XPath selectors of the form ns:name. Note that ns is a prefix defined within the XPath processor, not by the document being processed, so has no relationship to how xmlns attributes are used in the XML representation.

See e.g. this "common XSLT mistakes" page, talking about the closely related XSLT 1.0:

To access namespaced elements in XPath, you must define a prefix for their namespace. [...] Unfortunately, XSLT version 1.0 has no concept similar to a default namespace; therefore, you must repeat namespace prefixes again and again.

According to an answer to a similar question, XPath 2.0 does include a notion of "default namespace", and the XSLT page linked above mentions this also in the context of XSLT 2.0.

Unfortunately, all of the built-in XML extensions in PHP are built on top of the libxml2 and libxslt libraries, which support only version 1.0 of XPath and XSLT.

So other than pre-processing the document not to use namespaces, your only option would be to find an XPath 2.0 processor that you could plug in to PHP.

(As an aside, it's worth noting that if you have unprefixed attributes in your XML document, they are not technically in the default namespace, but rather in no namespace at all; see XML Namespaces and Unprefixed Attributes for discussion of this oddity of the Namespace spec.)

like image 175
IMSoP Avatar answered Oct 19 '22 05:10

IMSoP