Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity, dealing with large number of records (> 35 mlns)

We have a rather large set of related tables with over 35 million related records each. I need to create a couple of WCF methods that would query the database with some parameters (data ranges, type codes, etc.) and return related results sets (from 10 to 10,000 records).

The company is standardized on EF 4.0 but is open to 4.X. I might be able to make argument to migrate to 5.0 but it's less likely.

What’s the best approach to deal with such a large number of records using Entity? Should I create a set of stored procs and call them from Entity or there is something I can do within Entity?

I do not have any control over the databases so I cannot split the tables or create some materialized views or partitioned tables.

Any input/idea/suggestion is greatly appreciated.

like image 816
Mike Z Avatar asked Jan 07 '13 22:01

Mike Z


2 Answers

At my work I faced a similar situation. We had a database with many tables and most of them contained around 7- 10 million records each. We used Entity framework to display the data but the page seemed to display very slow (like 90 to 100 seconds). Even the sorting on the grid took time. I was given the task to see if it could be optimized or not. and well after profiling it (ANTS profiler) I was able to optimize it (under 7 secs).

so the answer is Yes, Entity framework can handle loads of records (in millions) but some care must be taken

  1. Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible
  2. Yes, not you should, you must use stored procedures and import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Sames goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(@id)".
  3. EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy .
  4. Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster
  5. Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.
  6. When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.
  7. When you have to make changes in the same entity at different places, try to use the same entity by passing it and send the changes only once rather than each one fetching a fresh piece, makes changes and stores it (Real performance gain tip).
  8. When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.
  9. Transactions are slow. be careful with them.

if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.

like image 89
Simple Fellow Avatar answered Nov 04 '22 18:11

Simple Fellow


My experience with EF4.1, code first: if you only need to read the records (i.e. you won't write them back) you will gain a performance boost by turning of change tracking for your context:

yourDbContext.Configuration.AutoDetectChangesEnabled = false;

Do this before loading any entities. If you need to update the loaded records you can allways call

yourDbContext.ChangeTracker.DetectChanges();

before calling SaveChanges().

like image 24
Stephan Keller Avatar answered Nov 04 '22 18:11

Stephan Keller