Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid memory overflow when querying large datasets with Entity Framework and LINQ

I have a class that handles all database methods, including Entity Framework related stuff. When data is needed, other classes may invoke a method in this class such as

public List<LocalDataObject> GetData(int start, int end);

The database is querying using LINQ to EF and the calling class can then iterate over the data. But since other classes have no access to the entities in EF, I need to perform a "ToList()" operation on the query and by that fetching the full dataset into memory.

What will happen if this set is VERY large (10s-100s of GB)?

Is there a more efficient way of doing iteration and still maintain loose coupling?

like image 573
Saul Avatar asked May 08 '11 12:05

Saul


People also ask

Is entity framework good for large database?

YES, EF DOES PERFORM JOINS IN MEMORY IF it has a set of values that are provided as a part of query or an in memory list, basically for anything that is not from the database, EF will pull everything from the database, perform the operations in memory and returns the results.

Which is faster Linq or Entity Framework?

LINQ To SQL is slow for the first time run. After first run provides acceptable performance. Entity Framework is also slow for the first run, but after first run provides slightly better performance compared to LINQ To SQL.


2 Answers

The correct way to work with large datasets in Entity framework is:

  • Use EFv4 and POCO objects - it will allow sharing objects with upper layer without introducing dependency on Entity framework
  • Turn off proxy creation / lazy loading to fully detach POCO entity from object context
  • Expose IQueryable<EntityType> to allow upper layer to specify query more precisely and limit the number of record loaded from database
  • When exposing IQueryable set MergeOption.NoTracking on ObjectQuery in your data access method. Combining this setting with turned off proxy creation should result in not cached entities and iteration through result of the query should always load only single materialized entity (without caching of loaded entities).

In your simple scenario you can always check that client doesn't ask too many records and simply fire exception or return only maximum allowed records.

like image 135
Ladislav Mrnka Avatar answered Sep 19 '22 05:09

Ladislav Mrnka


As much as I like EF for quick/simple data access, I probably wouldn't use it for such a scenario. When dealing with data of that size I'd opt for stored procedures that return exactly what you need, and nothing extra. Then use a lightweight DataReader to populate your objects.

The DataReader provides an unbuffered stream of data that allows procedural logic to efficiently process results from a data source sequentially. The DataReader is a good choice when retrieving large amounts of data because the data is not cached in memory.

Additionally, as far as memory management goes, of course make sure you wrap your code handling unmanaged resources in a using block for proper disposal/garbage collection.

You also may want to consider implementing paging.

like image 24
Kon Avatar answered Sep 19 '22 05:09

Kon