Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reduce the memory footprint with large datasets in EF5?

I'm trying to pull a large-ish dataset (1.4 million records) from a SQL Server and dump to a file in a WinForms application. I've attempted to do it with paging, so that I'm not holding too much in memory at once, but the process continues to grow it's memory footprint as it runs. About 25% through, it was taking up 600,000K. Am I doing the paging wrong? Can I get some suggestions on how to keep the memory usage from growing so much?

var query = (from organizations in ctxObj.Organizations
                 where organizations.org_type_cd == 1
                 orderby organizations.org_ID
                 select organizations);
int recordCount = query.Count();
int skipTo = 0;
int take = 1000;
if (recordCount > 0)
{
    while (skipTo < recordCount)
    {
        if (skipTo + take > recordCount) 
            take = recordCount - skipTo;

        foreach (Organization o in query.Skip(skipTo).Take(take))
        {
            writeRecord(o);
        }
        skipTo += take;
    }
}
like image 513
Homr Zodyssey Avatar asked Nov 06 '13 16:11

Homr Zodyssey


People also ask

How do you reduce Dataframe memory usage?

Simply Convert the int64 values as int8 and float64 as float8. This will reduce memory usage.

How would you do to process a huge stream of data that doesn't fit in memory?

Another way to handle large datasets is by chunking them. That is cutting a large dataset into smaller chunks and then processing those chunks individually. After all the chunks have been processed, you can compare the results and calculate the final findings.

How do you handle a large data set in Python?

When data is too large to fit into memory, you can use Pandas' chunksize option to split the data into chunks instead of dealing with one big block.


1 Answers

The object context will keep on objects in memory until it's disposed. I would recommend disposing the context after each batch to prevent the memory footprint from continuing to grow.

You can also use AsNoTracking() (http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx) since you are not saving back to the database.

like image 99
mayabelle Avatar answered Sep 22 '22 18:09

mayabelle