Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity framework large data set, out of memory exception

I am working the a very large data set, roughly 2 million records. I have the code below but get an out of memory exception after it has process around three batches, about 600,000 records. I understand that as it loops through each batch entity framework lazy loads, which is then trying to build up the full 2 million records into memory. Is there any way to unload the batch one I've processed it?

ModelContext dbContext = new ModelContext(); IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.OrderBy(t => t.TownID).Batch(200000); foreach (var batch in towns) {     SearchClient.Instance.IndexMany(batch, SearchClient.Instance.Settings.DefaultIndex, "Town", new SimpleBulkParameters() { Refresh = false }); } 

Note: The Batch method comes from this project: https://code.google.com/p/morelinq/

The search client is this: https://github.com/Mpdreamz/NEST

like image 736
Mike Norgate Avatar asked Aug 11 '13 07:08

Mike Norgate


1 Answers

The issue is that when you get data from EF there are actually two copies of the data created, one which is returned to the user and a second which EF holds onto and uses for change detection (so that it can persist changes to the database). EF holds this second set for the lifetime of the context and its this set thats running you out of memory.

You have 2 options to deal with this

  1. renew your context each batch
  2. Use .AsNoTracking() in your query eg:

    IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.AsNoTracking().OrderBy(t => t.TownID).Batch(200000); 

this tells EF not to keep a copy for change detection. You can read a little more about what AsNoTracking does and the performance impacts of this on my blog: http://blog.staticvoid.co.nz/2012/4/2/entity_framework_and_asnotracking

like image 174
Not loved Avatar answered Sep 21 '22 23:09

Not loved