What's the most efficient way to select multiple entities by primary key?
public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids) { //return ids.Select(id => Images.Find(id)); //is this cool? return Images.Where( im => ids.Contains(im.Id)); //is this better, worse or the same? //is there a (better) third way? }
I realise that I could do some performance tests to compare, but I am wondering if there is in fact a better way than both, and am looking for some enlightenment on what the difference between these two queries is, if any, once they have been 'translated'.
EF5 is built into the core of . NET 4.5, whereas EF6 has been shifted out, and is open source. This means that you must add the new EF6 assemblies to all of the relevant projects in the solution, in particular the entry project. This means that you must remove assembly System.
Using Contains in Entity Framework is actually very slow. It's true that it translates into an IN clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL.
Entity Framework has the following forms of caching built-in: Object caching – the ObjectStateManager built into an ObjectContext instance keeps track in memory of the objects that have been retrieved using that instance. This is also known as first-level cache.
UPDATE: With the addition of InExpression in EF6, the performance of processing Enumerable.Contains improved dramatically. The analysis in this answer is great but largely obsolete since 2013.
Using Contains
in Entity Framework is actually very slow. It's true that it translates into an IN
clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR
concatenations because there is no native expression which represents an IN
. When the SQL is created this expression of many OR
s is recognized and collapsed back into the SQL IN
clause.
This does not mean that using Contains
is worse than issuing one query per element in your ids
collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains
query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.
It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains
expression for each roundtrip.
This approach and also the limitations of using Contains
with Entity Framework is shown and explained here:
Why does the Contains() operator degrade Entity Framework's performance so dramatically?
It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString)
or dbContext.Images.SqlQuery(sqlString)
where sqlString
is the SQL shown in @Rune's answer.
Edit
Here are some measurements:
I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:
using (var context = new MyDbContext()) { Random rand = new Random(); var ids = new List<int>(); for (int i = 0; i < 20000; i++) ids.Add(rand.Next(550000)); Stopwatch watch = new Stopwatch(); watch.Start(); // here are the code snippets from below watch.Stop(); var msec = watch.ElapsedMilliseconds; }
Test 1
var result = context.Set<MyEntity>() .Where(e => ids.Contains(e.ID)) .ToList();
Result -> msec = 85.5 sec
Test 2
var result = context.Set<MyEntity>().AsNoTracking() .Where(e => ids.Contains(e.ID)) .ToList();
Result -> msec = 84.5 sec
This tiny effect of AsNoTracking
is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).
For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.
Test 3
var values = new StringBuilder(); values.AppendFormat("{0}", ids[0]); for (int i = 1; i < ids.Count; i++) values.AppendFormat(", {0}", ids[i]); var sql = string.Format( "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})", values); var result = context.Set<MyEntity>().SqlQuery(sql).ToList();
Result -> msec = 5.1 sec
Test 4
// same as Test 3 but this time including AsNoTracking var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();
Result -> msec = 3.8 sec
This time the effect of disabling tracking is more noticable.
Test 5
// same as Test 3 but this time using Database.SqlQuery var result = context.Database.SqlQuery<MyEntity>(sql).ToList();
Result -> msec = 3.7 sec
My understanding is that context.Database.SqlQuery<MyEntity>(sql)
is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking()
, so there is no difference expected between Test 4 and Test 5.
(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)
Edit 2
Test 6
Even 20000 roundtrips to the database are faster than using Contains
:
var result = new List<MyEntity>(); foreach (var id in ids) result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));
Result -> msec = 73.6 sec
Note that I have used SingleOrDefault
instead of Find
. Using the same code with Find
is very slow (I cancelled the test after several minutes) because Find
calls DetectChanges
internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false
) leads to roughly the same performance as SingleOrDefault
. Using AsNoTracking
reduces the time by one or two seconds.
Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With