Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Using XPath with XmlDocument - Can't select nodes in a namespace (returning null)

Tags:

c#

xml

xpath

I'm trying to do something which ought to be quite simple but I'm having terrible trouble. I have tried code from multiple similar questions in StackOverflow but to no avail. I'm trying to get various pieces of information from an ABN lookup with the Australian government. Here is anonymised return XML value:

    <?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">
            <ABRPayloadSearchResults>
                <request>
                    <identifierSearchRequest>
                        <authenticationGUID>00000000-0000-0000-0000-000000000000</authenticationGUID>
                        <identifierType>ABN</identifierType>
                        <identifierValue>00 000 000 000</identifierValue>
                        <history>N</history>
                    </identifierSearchRequest>
                </request>
                <response>
                    <usageStatement>The Registrar of the ABR monitors the quality of the information available on this website and updates the information regularly. However, neither the Registrar of the ABR nor the Commonwealth guarantee that the information available through this service (including search results) is accurate, up to date, complete or accept any liability arising from the use of or reliance upon this site.</usageStatement>
                    <dateRegisterLastUpdated>2017-01-01</dateRegisterLastUpdated>
                    <dateTimeRetrieved>2017-01-01T00:00:00.2016832+10:00</dateTimeRetrieved>
                    <businessEntity>
                        <recordLastUpdatedDate>2017-01-01</recordLastUpdatedDate>
                        <ABN>
                            <identifierValue>00000000000</identifierValue>
                            <isCurrentIndicator>Y</isCurrentIndicator>
                            <replacedFrom>0001-01-01</replacedFrom>
                        </ABN>
                        <entityStatus>
                            <entityStatusCode>Active</entityStatusCode>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </entityStatus>
                        <ASICNumber>000000000</ASICNumber>
                        <entityType>
                            <entityTypeCode>PRV</entityTypeCode>
                            <entityDescription>Australian Private Company</entityDescription>
                        </entityType>
                        <goodsAndServicesTax>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </goodsAndServicesTax>
                        <mainName>
                            <organisationName>COMPANY LTD</organisationName>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                        </mainName>
                        <mainBusinessPhysicalAddress>
                            <stateCode>NSW</stateCode>
                            <postcode>0000</postcode>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </mainBusinessPhysicalAddress>
                    </businessEntity>
                </response>
            </ABRPayloadSearchResults>
        </ABRSearchByABNResponse>
    </soap:Body>
</soap:Envelope>

so I want to get for example the whole response using xpath="//response" then use various xpath statement within that node to get the <organisationName> ("//mainName/organisationName") and other values. It should be simple right? Those xpath statements appear to work when testing in Notepad++but I use this code in Visual Studio:

XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(ipxml);
XmlNode xnode = xdoc.SelectSingleNode("//response");
XmlNodeList xlist = xdoc.SelectNodes("//mainName/organisationName");
xlist = xdoc.GetElementsByTagName("mainName");

But it always returns null, whatever I put in the xpath I get a null return for the node and 0 count for the list whether I'm selecting something with child nodes, a value or not. I can get the nodes using GetElementsByTagName() as in the example which returns the correct node, but I wanted to do it 'properly' selecting the proper field using xpath.

I also tried using XElement and Linq but still no luck. Is there something weird about the XML?

I'm sure it must something simple but I've been struggling for ages.

like image 698
Chris H Avatar asked Nov 03 '25 21:11

Chris H


2 Answers

You aren't dealing with the namespaces present in the document. Specifically, the high level element:

<ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">

places ABRSearchByABNResponse, and all its child elements (unless overridden by another xmlns) into the namespace http://abr.business.gov.au/ABRXMLSearch/. In order to navigate to these nodes (without hacks like GetElementsByTagName or using local-name()), you'll need to register the namespaces with an XmlNamespaceManager, like so. The xmlns aliases don't necessarily need to match those used in the original document, but it's a good convention to do so:

XmlDocument

var xdoc = new XmlDocument();
var ns = new XmlNamespaceManager(xdoc.NameTable);
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");

xdoc.LoadXml(ipxml);
// NB need to use the overload accepting a namespace
var xresponse = xdoc.SelectSingleNode("//abr:response", ns);
var xlist = xdoc.SelectNodes("//abr:mainName/abr:organisationName", ns);

XDocument

More recently, the powers of LINQ can be harnessed with XDocument, which makes working with namespaces much easier (Descendants finds child nodes at any depth)

var xdoc = XDocument.Parse(ipxml);
XNamespace soap = "http://schemas.xmlsoap.org/soap/envelope/";
XNamespace abr = "http://abr.business.gov.au/ABRXMLSearch/";

var xresponse = xdoc.Descendants(abr + "response");
var xlist = xdoc.Descendants(abr + "organisationName");

XDocument + XPath

You can also resort to using XPath in Linq to Xml, especially for more complicated expressions:

var xdoc = XDocument.Parse(ipxml);
var ns = new XmlNamespaceManager(new NameTable());
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");

var xresponse = xdoc.XPathSelectElement("//abr:response", ns);
var xlist = xdoc.XPathSelectElement("//abr:mainName/abr:organisationName", ns);
like image 70
StuartLC Avatar answered Nov 05 '25 13:11

StuartLC


You need to call SelectSingleNode and SelectNodes on the DocumentElement. You are calling them on the document itself.

For example:

XmlNode xnode = xdoc.DocumentElement.SelectSingleNode("//response");
like image 28
Philip Smith Avatar answered Nov 05 '25 11:11

Philip Smith