Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does XPath deal with XML namespaces?

How does XPath deal with XML namespaces?

If I use

/IntuitResponse/QueryResponse/Bill/Id 

to parse the XML document below I get 0 nodes back.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <IntuitResponse xmlns="http://schema.intuit.com/finance/v3"                  time="2016-10-14T10:48:39.109-07:00">     <QueryResponse startPosition="1" maxResults="79" totalCount="79">         <Bill domain="QBO" sparse="false">             <Id>=1</Id>         </Bill>     </QueryResponse> </IntuitResponse> 

However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.

like image 930
Adam Avatar asked Nov 25 '16 00:11

Adam


People also ask

What is the use of XPath in XML?

The XML Path Language (XPath) is used to uniquely identify or address parts of an XML document. An XPath expression can be used to search through an XML document, and extract information from any part of the document, such as an element or attribute (referred to as a node in XML) in it.

What are namespace nodes in XPath?

Introduction to XPath namespace. In an XML document, namespaces are used to provide uniquely named components and attributes. A namespace is made up of two parts: a prefix and a URL. This indicates the location of a document that defines the namespace in question.

How does XML namespace work?

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.


1 Answers

Defining namespaces in XPath (recommended)

XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.

It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.


Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.

(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)

C#:

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3"); XmlNodeList nodes = el.SelectNodes(@"/i:IntuitResponse/i:QueryResponse", nsmgr); 

Java (SAX):

NamespaceSupport support = new NamespaceSupport(); support.pushContext(); support.declarePrefix("i", "http://schema.intuit.com/finance/v3"); 

Java (XPath):

xpath.setNamespaceContext(new NamespaceContext() {     public String getNamespaceURI(String prefix) {       switch (prefix) {         case "i": return "http://schema.intuit.com/finance/v3";         // ...        }     }); 
  • Remember to call DocumentBuilderFactory.setNamespaceAware(true).
  • See also: Java XPath: Queries with default namespace xmlns

JavaScript:

See Implementing a User Defined Namespace Resolver:

function nsResolver(prefix) {   var ns = {     'i' : 'http://schema.intuit.com/finance/v3'   };   return ns[prefix] || null; } document.evaluate( '/i:IntuitResponse/i:QueryResponse',                     document, nsResolver, XPathResult.ANY_TYPE,                     null ); 

Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().

Perl (LibXML):

my $xc = XML::LibXML::XPathContext->new($doc); $xc->registerNs('i', 'http://schema.intuit.com/finance/v3'); my @nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse'); 

Python (lxml):

from lxml import etree f = StringIO('<IntuitResponse>...</IntuitResponse>') doc = etree.parse(f) r = doc.xpath('/i:IntuitResponse/i:QueryResponse',                namespaces={'i':'http://schema.intuit.com/finance/v3'}) 

Python (ElementTree):

namespaces = {'i': 'http://schema.intuit.com/finance/v3'} root.findall('/i:IntuitResponse/i:QueryResponse', namespaces) 

Python (Scrapy):

response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3') response.xpath('/i:IntuitResponse/i:QueryResponse').getall() 

PhP:

Adapted from @Tomalak's answer using DOMDocument:

$result = new DOMDocument(); $result->loadXML($xml);  $xpath = new DOMXpath($result); $xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");  $result = $xpath->query("/i:IntuitResponse/i:QueryResponse"); 

See also @IMSoP's canonical Q/A on PHP SimpleXML namespaces.

Ruby (Nokogiri):

puts doc.xpath('/i:IntuitResponse/i:QueryResponse',                 'i' => "http://schema.intuit.com/finance/v3") 

Note that Nokogiri supports removal of namespaces,

doc.remove_namespaces! 

but see the below warnings discouraging the defeating of XML namespaces.

VBA:

xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'" doc.setProperty "SelectionNamespaces", xmlNS   Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse") 

VB.NET:

xmlDoc = New XmlDocument() xmlDoc.Load("file.xml") nsmgr = New XmlNamespaceManager(New XmlNameTable()) nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3"); nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",                                            nsmgr) 

SoapUI (doc):

declare namespace i='http://schema.intuit.com/finance/v3'; /i:IntuitResponse/i:QueryResponse 

xmlstarlet:

-N i="http://schema.intuit.com/finance/v3" 

XSLT:

<xsl:stylesheet version="1.0"                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                 xmlns:i="http://schema.intuit.com/finance/v3">    ... 

Once you've declared a namespace prefix, your XPath can be written to use it:

/i:IntuitResponse/i:QueryResponse 

Defeating namespaces in XPath (not recommended)

An alternative is to write predicates that test against local-name():

/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse'] 

Or, in XPath 2.0:

/*:IntuitResponse/*:QueryResponse 

Skirting namespaces in this manner works but is not recommended because it

  • Under-specifies the full element/attribute name.

  • Fails to differentiate between element/attribute names in different namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly1:

     /*[    namespace-uri()='http://schema.intuit.com/finance/v3'      and local-name()='IntuitResponse']  /*[    namespace-uri()='http://schema.intuit.com/finance/v3'      and local-name()='QueryResponse'] 

    1Thanks to Daniel Haley for the namespace-uri() note.

  • Is excessively verbose.

like image 146
kjhughes Avatar answered Sep 29 '22 13:09

kjhughes