Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework 5 performance concerns

Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.

Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.

The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:

if (parameter != null) { query = query.Where(c => c.Field == parameter); }

Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.

The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.

Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.

After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.

Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:

  • Orders (maximum 10)
    • Many navigation properties under Order
      • Some of these have other navigation under them (localization entities)
    • Order details (many order details per order)
      • Many navigation properties under each Order detail
        • Some of these have other navigation under them (localization entities)

This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:

  • The lazy loaded properties are called in sequence and not parallelized, thus taking more time
  • As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.

Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.

Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.

I thought of some things I could do to improve, like:

  • Table splitting to reduce the selected columns
  • Turn off object tracking, as this scenario is read only (have untracked entities)

With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.

From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.

So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.

like image 508
Matteo Mosca Avatar asked May 24 '13 19:05

Matteo Mosca


2 Answers

For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:

(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:

var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
    // by deferring the load until here, we can bring only the information we actually need 
    // into memory with a single query
    .ToList();

This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.

(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:

// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());

This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.

(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.

like image 83
ChaseMedallion Avatar answered Sep 28 '22 07:09

ChaseMedallion


Try to think of an efficient yet simple sql query to get the data for your views.

Is it even possible?

If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?

If yes, forget EF, write a stored procedure and use it to get the data.

Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:

http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html

As you can see, the difference between tracking and notracking scenario is significant.

I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.

like image 20
Wiktor Zychla Avatar answered Sep 28 '22 08:09

Wiktor Zychla