Read huge table with LINQ to SQL: Running out of memory vs slow paging

Tags:

I have a huge table which I need to read through on a certain order and compute some aggregate statistics. The table already has a clustered index for the correct order so getting the records themselves is pretty fast. I'm trying to use LINQ to SQL to simplify the code that I need to write. The problem is that I don't want to load all the objects into memory, since the DataContext seems to keep them around -- yet trying to page them results in horrible performance problems.

Here's the breakdown. Original attempt was this:

var logs = 
    (from record in dataContext.someTable 
     where [index is appropriate]
     select record);

foreach( linqEntity l in logs )
{
    // Do stuff with data from l
}

This is pretty fast, and streams at a good rate, but the problem is that the memory use of the application keeps going up never stops. My guess is that the LINQ to SQL entities are being kept around in memory and not being disposed properly. So after reading Out of memory when creating a lot of objects C# , I tried the following approach. This seems to be the common Skip/Take paradigm that many people use, with the added feature of saving memory.

Note that _conn is created beforehand, and a temporary data context is created for each query, resulting in the associated entities being garbage collected.

int skipAmount = 0;
bool finished = false;

while (!finished)
{
    // Trick to allow for automatic garbage collection while iterating through the DB
    using (var tempDataContext = new MyDataContext(_conn) {CommandTimeout = 600})
    {               
        var query =
            (from record in tempDataContext.someTable
             where [index is appropriate]
             select record);

        List<workerLog> logs = query.Skip(skipAmount).Take(BatchSize).ToList();
        if (logs.Count == 0)
        {
            finished = true;
            continue;
        }

        foreach( linqEntity l in logs )
        {
            // Do stuff with data from l
        }

        skipAmount += logs.Count;
    }
}

Now I have the desired behavior that memory usage doesn't increase at all as I am streaming through the data. Yet, I have a far worse problem: each Skip is causing the data to load more and more slowly as the underlying query seems to actually cause the server to go through all the data for all previous pages. While running the query each page takes longer and longer to load, and I can tell that this is turning into a quadratic operation. This problem has appeared in the following posts:

LINQ Skip() Problem
LINQ2SQL select orders and skip/take

I can't seem to find a way to do this with LINQ that allows me to have limited memory use by paging data, and yet still have each page load in constant time. Is there a way to do this properly? My hunch is that there might be some way to tell the DataContext to explicitly forget about the object in the first approach above, but I can't find out how to do that.

533

asked Sep 17 '12 23:09

Andrew Mao

1 Answers

After madly grasping at some straws, I found that the DataContext's ObjectTrackingEnabled = false could be just what the doctor ordered. It is, not surprisingly, specifically designed for a read-only case like this.

using (var readOnlyDataContext = 
    new MyDataContext(_conn) {CommandTimeout = really_long, ObjectTrackingEnabled = false})
{                                                 
    var logs =
        (from record in readOnlyDataContext.someTable
         where [index is appropriate]
         select record);

    foreach( linqEntity l in logs )
    {
        // Do stuff with data from l   
    }                
}

The above approach does not use any memory when streaming through objects. When writing data, I can use a different DataContext that has object tracking enabled, and that seems to work okay. However, this approach does have the problem of a SQL query that can take an hour or more to stream and complete, so if there's a way to do the paging as above without the performance hit, I'm open to other alternatives.

A warning about turning object tracking off: I found out that when you try to do multiple concurrent reads with the same DataContext, you don't get the error There is already an open DataReader associated with this Command which must be closed first. The application just goes into an infinite loop with 100% CPU usage. I'm not sure if this is a C# bug or a feature.

137

answered Sep 30 '22 15:09

Andrew Mao

Related questions
                            
                                Is casting char to int a safe operation in C#?
                            
                                WPF Combo Box Data Binding Selected Item
                            
                                Pass class instance without reference
                            
                                What happens when Exception occurs in a method in Multicast Delegate
                            
                                German UI Culture de-DE decimal changing to comma value issue in asp.net
                            
                                Replacing Nested if Statement With AND
                            
                                How to render audio waveform?
                            
                                .net regex match line
                            
                                Can I use Square Brackets to pull a value from a class
                            
                                Can't read appSettings value from Web.Config
                            
                                Recent OpenHardwareMonitor Sample Code C#
                            
                                how to check if a control of a certain type?
                            
                                Select only first object in LINQ?
                            
                                pass list item from c# to javascript array
                            
                                Schedule a C# Windows Service to perform a task daily
                            
                                StringBuilder performance in C#?
                            
                                How do interfaces solve the diamond problem?
                            
                                When is obj.GetType().IsInstanceOfType(typeof(MyClass)) true?
                            
                                Why can't Point and Rectangle be used as optional parameters?
                            
                                How To Parse String in format "Name <Email>"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read huge table with LINQ to SQL: Running out of memory vs slow paging

Tags:

c#

sql

database

linq

skip-take

Andrew Mao

People also ask

1 Answers

Andrew Mao

Recent Activity

Donate For Us