Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XDocument + IEnumerable is causing out of memory exception in System.Xml.Linq.dll

Basically I have a program which, when it starts loads a list of files (as FileInfo) and for each file in the list it loads a XML document (as XDocument).

The program then reads data out of it into a container class (storing as IEnumerables), at which point the XDocument goes out of scope.

The program then exports the data from the container class to a database. After the export the container class goes out of scope, however, the garbage collector isn't clearing up the container class which, because its storing as IEnumerable, seems to lead to the XDocument staying in memory (Not sure if this is the reason but the task manager is showing the memory from the XDocument isn't being freed).

As the program is looping through multiple files eventually the program is throwing a out of memory exception. To mitigate this ive ended up using

System.GC.Collect(); 

to force the garbage collector to run after the container goes out of scope. this is working but my questions are:

  • Is this the right thing to do? (Forcing the garbage collector to run seems a bit odd)
  • Is there a better way to make sure the XDocument memory is being disposed?
  • Could there be a different reason, other than the IEnumerable, that the document memory isnt being freed?

Thanks.


Edit: Code Samples:

  • Container Class:

    public IEnumerable<CustomClassOne> CustomClassOne { get; set; }
    public IEnumerable<CustomClassTwo> CustomClassTwo { get; set; }
    public IEnumerable<CustomClassThree> CustomClassThree { get; set; }
    ...
    public IEnumerable<CustomClassNine> CustomClassNine { get; set; }
    
  • Custom Class:

    public long VariableOne { get; set; }
    public int VariableTwo { get; set; }
    public DateTime VariableThree { get; set; }
    ...
    

    Anyway that's the basic structures really. The Custom Classes are populated through the container class from the XML document. The filled structures themselves use very little memory.

A container class is filled from one XML document, goes out of scope, the next document is then loaded e.g.

    public static void ExportAll(IEnumerable<FileInfo> files)
    {
        foreach (FileInfo file in files)
        {
            ExportFile(file);
            //Temporary to clear memory
            System.GC.Collect();
        }
    }
    private static void ExportFile(FileInfo file)
    {
        ContainerClass containerClass = Reader.ReadXMLDocument(file);
        ExportContainerClass(containerClass);
        //Export simply dumps the data from the container class into a database
        //Container Class (and any passed container classes) goes out of scope at end of export
    }

    public static ContainerClass ReadXMLDocument(FileInfo fileToRead)
    {
        XDocument document = GetXDocument(fileToRead);
        var containerClass = new ContainerClass();

        //ForEach customClass in containerClass
        //Read all data for customClass from XDocument

        return containerClass;
    }

Forgot to mention this bit (not sure if its relevent), the files can be compressed as .gz so I have the GetXDocument() method to load it

    private static XDocument GetXDocument(FileInfo fileToRead)
    {
        XDocument document;

        using (FileStream fileStream = new FileStream(fileToRead.FullName, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            if (String.Equals(fileToRead.Extension, ".gz", StringComparison.OrdinalIgnoreCase))
            {
                using (GZipStream zipStream = new GZipStream(fileStream, CompressionMode.Decompress))
                {
                    document = XDocument.Load(zipStream);
                }
            }
            else
            {
                document = XDocument.Load(fileStream);
            }
            return document;
        }
    }

Hope this is enough information. Thanks

Edit: The System.GC.Collect() is not working 100% of the time, sometimes the program seems to retain the XDocument, anyone have any idea why this might be?

public static ContainerClass ReadXMLDocument(FileInfo fileToRead)
{
    XDocument document = GetXDocument(fileToRead);
    var containerClass = new ContainerClass();

    //ForEach customClass in containerClass
    //Read all data for customClass from XDocument

    containerClass.CustomClassOne = document.Descendants(ElementName)
        .DescendantsAndSelf(ElementChildName)
        .Select(a => ExtractDetails(a));

    return containerClass;
}

private static CustomClassOne ExtractDetails(XElement itemElement)
{
    var customClassOne = new CustomClassOne();
    customClassOne.VariableOne = Int64.Parse(itemElement.Attribute("id").Value.Substring(4));
    customClassOne.VariableTwo = int.Parse(itemElement.Element(osgb + "version").Value);
    customClassOne.VariableThree = DateTime.ParseExact(itemElement.Element(osgb + "versionDate").Value,
            "yyyy-MM-dd", CultureInfo.InvariantCulture);
    return customClassOne;
}
like image 663
Manatherin Avatar asked Dec 15 '10 16:12

Manatherin


2 Answers

Forcing a manual garbage collection might appear to have solved your problem in some cases, but it's a pretty sure bet that this is nothing better than coincidence.

What you need to do is to stop guessing about what is causing your memory pressure problems, and to instead find out for sure.

I've used JetBrains dotTrace to very good effect in similar situations - set a breakpoint, trigger the profiler and browse through a view of all the "live" objects and their relationships. Makes it easy to find which objects are still retained, and by which references they're kept live.

While I haven't used it myself, the RedGate Ants Memory Profiler is also recommended by many.

Both of these tools have free trials, which should be enough to solve your current problem. Though, I'd strongly suggest that it's worth buying one or the other - dotTrace has saved me dozens of hours of troubleshooting memory issues, a very worthwhile ROI.

like image 161
Bevan Avatar answered Sep 21 '22 20:09

Bevan


Your code doesn't look bad to me and I don't see any single reason for forcing collection. If your custom class holds a reference to XElements from XDocument then GC will not collect neither them nor the doc itself. If something else is holding references to your enumerables then they won't be collected either. So I'd really like to see your custom class definition and how it's populated.

like image 40
Schultz9999 Avatar answered Sep 17 '22 20:09

Schultz9999