Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Find And Replace XML Nodes

Edit: I decided to take the LINQ to XML approach (see the answer below) that was recommended and everything works EXCEPT that I can't replace out the changed records with the records from the incremental file. I managed to make the program work by just removing the full file node and then adding in the incremental node. Is there a way to just swap them instead? Also, while this solution is very nice, is there any way to shrink down memory usage without losing the LINQ code? This solution may still work, but I would be willing to sacrifice time to lower memory usage.


I'm trying to take two XML files (a full file and an incremental file) and merge them together. The XML file looks like this:

<List>
    <Records>
        <Person id="001" recordaction="add">
            ...
        </Person>
    </Records>
</List>

The recordaction attribute can also be "chg" for changes or "del" for deletes. The basic logic of my program is:

1) Read the full file into an XmlDocument.

2) Read the incremental file into an XmlDocument, select the nodes using XmlDocument.SelectNodes(), place those nodes into a dictionary for easier searching.

3) Select all the nodes in the full file, loop through and check each against the dictionary containing the incremental records. If recordaction="chg" or "del" add the node to a list, then delete all the nodes from the XmlNodeList that are in that list. Finally, add recordaction="chg" or "add" records from the incremental file into the full file.

4) Save the XML file.

I'm having some serious problems with step 3. Here's the code for that function:

private void ProcessChanges(XmlNodeList nodeList, Dictionary<string, XmlNode> dictNodes)
    {
        XmlNode lastNode = null;
        XmlNode currentNode = null;
        List<XmlNode> nodesToBeDeleted = new List<XmlNode>();

        // If node from full file matches to incremental record and is change or delete, 
        // mark full record to be deleted.
        foreach (XmlNode fullNode in fullDocument.SelectNodes("/List/Records/Person"))
        {
            dictNodes.TryGetValue(fullNode.Attributes[0].Value, out currentNode);
            if (currentNode != null)
            {
                if (currentNode.Attributes["recordaction"].Value == "chg"
                    || currentNode.Attributes["recordaction"].Value == "del")
                {
                    nodesToBeDeleted.Add(currentNode);
                }
            }
            lastNode = fullNode;
        }

        // Delete marked records
        for (int i = nodeList.Count - 1; i >= 0; i--)
        {
            if(nodesToBeDeleted.Contains(nodeList[i]))
            {
                nodeList[i].ParentNode.RemoveChild(nodesToBeDeleted[i]);
            }
        }

        // Add in the incremental records to the new full file for records marked add or change.
        foreach (XmlNode weeklyNode in nodeList)
        {
            if (weeklyNode.Attributes["recordaction"].Value == "add"
                || weeklyNode.Attributes["recordaction"].Value == "chg")
            {
                fullDocument.InsertAfter(weeklyNode, lastNode);
                lastNode = weeklyNode;
            }
        }
    }

The XmlNodeList being passed in is just all of the incremental records that were selected out from the incremental file, and the dictionary is just those same nodes but key'd on the id so I didn't have to loop through all of the incremental records each time. Right now the program is dying at the "Delete marked records" stage due to indexing out of bounds. I'm pretty sure the "Add in the incremental records" doesn't work either. Any ideas? Also some suggestions on making this more efficient would be nice. I could potentially run into a problem because it's reading in a 250MB file which balloons up to 750MB in memory, so I was wondering if there was an easier way to go node-by-node in the full file. Thanks!

like image 775
Tony Trozzo Avatar asked Jul 22 '11 17:07

Tony Trozzo


1 Answers

Here's an example of how you might accomplish it with LINQ-to-XML. No dictionary is needed:

using System.Xml.Linq;

// Load the main and incremental xml files into XDocuments
XDocument fullFile = XDocument.Load("fullfilename.xml");
XDocument incrementalFile = XDocument.Load("incrementalfilename.xml");    

// For each Person in the incremental file
foreach (XElement person in incrementalFile.Descendants("Person")) {

    // If the person should be added to the full file
    if (person.Attribute("recordaction").Value == "add") {
        fullFile.Element("List").Element("Records").Add(person); // Add him
    }

    // Else the person already exists in the full file
    else {
        // Find the element of the Person to delete or change
        var personToChange =
                (from p in fullFile.Descendants("Person")
                    where p.Attribute("id").Value == person.Attribute("id").Value
                    select p).Single();

        // Perform the appropriate operation
        switch (person.Attribute("recordaction").Value) {
            case "chg":
                personToChange.ReplaceWith(person);
                break;
            case "del":
                personToChange.Remove();
                break;
            default:
                throw new ApplicationException("Unrecognized attribute");
        }
    }
}// end foreach

// Save the changes to the full file
fullFile.Save("fullfilename.xml");

Please let me know if you have any problems running it and I'll edit and fix it. I'm pretty sure it's correct, but don't have VS available at the moment.

EDIT: fixed the "chg" case to use personToChange.ReplaceWith(person) rather than personToChange = person. The latter doesn't replace anything, as it just shifts the reference away from the underlying document.

like image 187
Kevin D. Avatar answered Sep 28 '22 02:09

Kevin D.