Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xmlNode.SelectSingleNode always returns same value even though the node changes

Tags:

c#

xml

I am reading in a bunch of XML files, transforming them and loading the data in to another system.

Previously I had done this using ThreadPool, however the provider of the files and therefore the structure has changed, so I'm now trying Aysync-Await and getting an odd result.

As I process the files I get a list of the xmlNodes and loop over them

foreach (XmlNode currentVenue in venueNodes)
{
      Console.WriteLine(currentVenue.OuterXml);
      Console.WriteLine(currentVenue.SelectSingleNode(@"//venueName").InnerText);
}

however the second WriteLine always returns the result expected for the first node, example:

<venue venueID="xartrix" lastModified="2012-08-20 10:49:30"><venueName>Artrix</venueName></venue>
Artrix
<venue venueID="xbarins" lastModified="2013-04-29 11:39:07"><venueName>The Barber Institute Of Fine Arts, University Of Birmingham</venueName></venue>
Artrix
<venue venueID="xbirmus" lastModified="2012-11-13 16:41:13"><venueName>Birmingham Museum &amp; Art Gallery</venueName></venue>
Artrix

here is the complete code:

public async Task ProcessFiles()
{
    string[] filesToProcess = Directory.GetFiles(_filePath);
    List<Task> tasks = new List<Task>();

    foreach (string currentFile in filesToProcess)
    {
        tasks.Add(Task.Run(()=>processFile(currentFile)));
    }

    await Task.WhenAll(tasks);

}

private async Task processFile(string currentFile)
{
    try
    {
         XmlDocument currentXmlFile = new XmlDocument();
         currentXmlFile.Load(currentFile);

         //select nodes for processing
         XmlNodeList venueNodes = currentXmlFile.SelectNodes(@"//venue");

         foreach (XmlNode currentVenue in venueNodes)
         {
              Console.WriteLine(currentVenue.InnerXml);
              Console.WriteLine(currentVenue.SelectSingleNode(@"//venueName").InnerText);                 
         }
     }
     catch (Exception e)
     {
         Console.WriteLine(e.Message);
     }
 }

Obviously I've missed something, but I cannot see what, can someone point it out please?

like image 206
Stuart Avatar asked Dec 21 '22 05:12

Stuart


2 Answers

SelectSingleNode returns only a single node in document order from the document. @jbl is correct, //venueName starts from the document root. The // xpath operator is the "descendent selector" operator.

I work with XML and XPath often and this is a common mistake. You need to make sure that your context node is correct when calling SelectSingleNode. So, like we just all said, using //venueName gets the first <venueName /> node in document order starting from the root of the document.

In order to get the <venueName /> node that is a child of the current node you're iterating over, you need to use the following code:

foreach (XmlNode currentVenue in venueNodes)
{
    Console.WriteLine(currentVenue.OuterXml);
    Console.WriteLine(currentVenue.SelectSingleNode(@".//venueName").InnerText); // The '.' means from the current node. Without it, searching starts from the document root, not currentVenue.
}

That should solve your problem.

like image 94
fourpastmidnight Avatar answered May 11 '23 06:05

fourpastmidnight


Doesn't //venueName search from the document root ?

I guess that, combined with SelectSingleNode, will always end-up on the same resulting node (the first venueName node of the document)

You may try replacing //venueName with venueName

like image 34
jbl Avatar answered May 11 '23 06:05

jbl