When I want to traverse my XmlDocument using XPath, I came unto the problem that there were many ugly namespaces in the document, so I started using a NamespaceManager
along with the XPath.
The XML looks like this
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="KA0100401">
<Table>
<Row>
<Cell>Data</Cell>
</Row>
<!-- more rows... -->
</Table>
</Worksheet>
<Worksheet ss:Name="KA0100402">
<!-- .... --->
</Worksheet>
</Workbook>
Now, from what I see from this document, "urn:schemas-microsoft-com:office:spreadsheet"
is the default namespace, because it sits on the root element.
So, naively, I configured my NamespaceManager
like this:
XmlDocument document = new XmlDocument();
document.Load(reader);
XmlNamespaceManager manager = new XmlNamespaceManager(document.NameTable);
manager.AddNamespace(String.Empty, "urn:schemas-microsoft-com:office:spreadsheet");
manager.AddNamespace("o", "urn:schemas-microsoft-com:office:office");
manager.AddNamespace("x", "urn:schemas-microsoft-com:office:excel");
manager.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet");
manager.AddNamespace("html", "http://www.w3.org/TR/REC-html40");
But, when I try to access a node
foreach (XmlNode row in document.SelectNodes("/Workbook/Worksheet[1]/Table/Row", manager))
I never get any results. I was under the impression that by setting the first namespace with an empty prefix, I wouldn't need to set that when searching for nodes in that workspace.
But, as it is stated on the AddNamespace
method:
If an XPath expression does not include a prefix, it is assumed that the namespace Uniform Resource Identifier (URI) is the empty namespace.
Why is that? And, more important: How do I access nodes in the default namespace, if not using a prefix sets them into an empty namespace?
What good is setting the default namespace on the manager if I can't even access it when searching for nodes?
The Default NamespaceXPath treats the empty prefix as the null namespace. In other words, only prefixes mapped to namespaces can be used in XPath queries. This means that if you want to query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.
XPath queries are aware of namespaces in an XML document and can use namespace prefixes to qualify element and attribute names. Qualifying element and attribute names with a namespace prefix limits the nodes returned by an XPath query to only those nodes that belong to a specific namespace.
Within RUEI, all namespaces used in your XPath queries must be explicitly defined. If a namespace is used in a query, but is not defined, it will not work. To define a namespace, do the following: Select Configuration, then General, Advanced settings, and then XPath namespaces.
One of the primary motivations for defining an XML namespace is to avoid naming conflicts when using and re-using multiple vocabularies. XML Schema is used to create a vocabulary for an XML instance, and uses namespaces heavily.
From the XPath 1.0 spec:
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.
So this is not a matter regarding NamespaceManager
but rather the way XPath is defined to work.
The point that you're missing is that the prefixes you use in your NamespaceManager
don't have to be anything like the ones in your XML document. You can use the xcel
prefix for urn:schemas-microsoft-com:office:excel
if you want, and the sp
prefix for urn:schemas-microsoft-com:office:spreadsheet
. In fact, you're already assigning a prefix for that URN in your namespace manager, so you can just use that:
foreach (XmlNode row in
document.SelectNodes("/ss:Workbook/ss:Worksheet[1]/ss:Table/ss:Row", manager))
Regarding this question:
What good is setting the default namespace on the manager if I can't even access it when searching for nodes?
The good is that XmlNamespaceManager
is used for more than just evaluating XPath. For example, it could be used to keep track of the namespaces in an XML document, in which there is a concept of default namespaces.
@JLRishe's answer is correct for accessing nodes in the default namespace (ie. always mapping a prefix to the default namespace in the XmlNamespaceManager
).
Reading the entire context of the link from your quote (MSDN XmlNamespaceManager.AddNamespace) it is stated that the default "empty" prefix is not used in XPath expressions.
prefix Type: System.String
The prefix to associate with the namespace being added. Use String.Empty to add a default namespace.>
Note If the XmlNamespaceManager will be used for resolving namespaces in an XML Path Language (XPath) expression, a prefix must be specified. If an XPath expression does not include a prefix, it is assumed that the namespace Uniform Resource Identifier (URI) is the empty namespace. For more information about XPath expressions and the XmlNamespaceManager, refer to the XmlNode.SelectNodes and XPathExpression.SetContext methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With