UPDATE 3: According to this announcement, this has been addressed by the EF team in EF6 alpha 2. UPDATE 2: I've created a suggestion to fix this problem. To vote for it, go here. Consider a SQL database with one very simple table. <pre class="prettyprint"><code>CREATE TABLE Main (Id INT PRIMARY KEY) </code></pre> I populate the table with 10,000 records. <pre class="prettyprint"><code>WITH Numbers AS ( SELECT 1 AS Id UNION ALL SELECT Id + 1 AS Id FROM Numbers WHERE Id <= 10000 ) INSERT Main (Id) SELECT Id FROM Numbers OPTION (MAXRECURSION 0) </code></pre> I build an EF model for the table and run the following query in LINQPad (I am using "C# Statements" mode so LINQPad doesn't create a dump automatically). <pre class="prettyprint"><code>var rows = Main .ToArray(); </code></pre> Execution time is ~0.07 seconds. Now I add the Contains operator and re-run the query. <pre class="prettyprint"><code>var ids = Main.Select(a => a.Id).ToArray(); var rows = Main .Where (a => ids.Contains(a.Id)) .ToArray(); </code></pre> Execution time for this case is 20.14 seconds (288 times slower)! At first I suspected that the T-SQL emitted for the query was taking longer to execute, so I tried cutting and pasting it from LINQPad's SQL pane into SQL Server Management Studio. <pre class="prettyprint"><code>SET NOCOUNT ON SET STATISTICS TIME ON SELECT [Extent1].[Id] AS [Id] FROM [dbo].[Primary] AS [Extent1] WHERE [Extent1].[Id] IN (1,2,3,4,5,6,7,8,... </code></pre> And the result was <pre class="prettyprint"><code>SQL Server Execution Times: CPU time = 0 ms, elapsed time = 88 ms. </code></pre> Next I suspected LINQPad was causing the problem, but performance is the same whether I run it in LINQPad or in a console application. So, it appears that the problem is somewhere within Entity Framework. Am I doing something wrong here? This is a time-critical part of my code, so is there something I can do to speed up performance? I am using Entity Framework 4.1 and Sql Server 2008 R2. UPDATE 1: In the discussion below there were some questions about whether the delay occurred while EF was building the initial query or while it was parsing the data it received back. To test this I ran the following code, <pre class="prettyprint"><code>var ids = Main.Select(a => a.Id).ToArray(); var rows = (ObjectQuery<MainRow>) Main .Where (a => ids.Contains(a.Id)); var sql = rows.ToTraceString(); </code></pre> which forces EF to generate the query without executing it against the database. The result was that this code required ~20 secords to run, so it appears that almost all of the time is taken in building the initial query. CompiledQuery to the rescue then? Not so fast ... CompiledQuery requires the parameters passed into the query to be fundamental types (int, string, float, and so on). It won't accept arrays or IEnumerable, so I can't use it for a list of Ids.

If you find a performance problem which is blocking for you don't try to spend ages on solving it because you will most probably don't success and you will have to communicate it with MS directly (if you have premium support) and it takes ages. Use workaround and workaround in case of performance issue and EF means direct SQL. There is nothing bad about it. Global idea that using EF = not using SQL anymore is a lie. You have SQL Server 2008 R2 so: <ul> <li>Create stored procedure accepting table valued parameter to pass your ids</li> <li>Let your stored procedure return multiple result sets to emulate <code>Include</code> logic in optimal way</li> <li>If you need some complex query building use dynamic SQL inside stored procedure</li> <li>Use <code>SqlDataReader</code> to get results and construct your entities</li> <li>Attach them to context and work with them as if they were loaded from EF</li> </ul> If the performance is critical for you you will not find better solution. This procedure cannot be mapped and executed by EF because current version doesn't support either table valued parameters or multiple result sets.

Why does the Contains() operator degrade Entity Framework's performance so dramatically?

Tags:

performance

c#

sql

contains

entity-framework-4

UPDATE 3: According to this announcement, this has been addressed by the EF team in EF6 alpha 2.

UPDATE 2: I've created a suggestion to fix this problem. To vote for it, go here.

Consider a SQL database with one very simple table.

Click to copy

CREATE TABLE Main (Id INT PRIMARY KEY)

I populate the table with 10,000 records.

Click to copy

WITH Numbers AS (   SELECT 1 AS Id   UNION ALL   SELECT Id + 1 AS Id FROM Numbers WHERE Id <= 10000 ) INSERT Main (Id) SELECT Id FROM Numbers OPTION (MAXRECURSION 0)

I build an EF model for the table and run the following query in LINQPad (I am using "C# Statements" mode so LINQPad doesn't create a dump automatically).

Click to copy

var rows =    Main   .ToArray();

Execution time is ~0.07 seconds. Now I add the Contains operator and re-run the query.

Click to copy

var ids = Main.Select(a => a.Id).ToArray(); var rows =    Main   .Where (a => ids.Contains(a.Id))   .ToArray();

Execution time for this case is 20.14 seconds (288 times slower)!

At first I suspected that the T-SQL emitted for the query was taking longer to execute, so I tried cutting and pasting it from LINQPad's SQL pane into SQL Server Management Studio.

Click to copy

SET NOCOUNT ON SET STATISTICS TIME ON SELECT  [Extent1].[Id] AS [Id] FROM [dbo].[Primary] AS [Extent1] WHERE [Extent1].[Id] IN (1,2,3,4,5,6,7,8,...

And the result was

Click to copy

SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 88 ms.

Next I suspected LINQPad was causing the problem, but performance is the same whether I run it in LINQPad or in a console application.

So, it appears that the problem is somewhere within Entity Framework.

Am I doing something wrong here? This is a time-critical part of my code, so is there something I can do to speed up performance?

I am using Entity Framework 4.1 and Sql Server 2008 R2.

UPDATE 1:

In the discussion below there were some questions about whether the delay occurred while EF was building the initial query or while it was parsing the data it received back. To test this I ran the following code,

Click to copy

var ids = Main.Select(a => a.Id).ToArray(); var rows =    (ObjectQuery<MainRow>)   Main   .Where (a => ids.Contains(a.Id)); var sql = rows.ToTraceString();

which forces EF to generate the query without executing it against the database. The result was that this code required ~20 secords to run, so it appears that almost all of the time is taken in building the initial query.

CompiledQuery to the rescue then? Not so fast ... CompiledQuery requires the parameters passed into the query to be fundamental types (int, string, float, and so on). It won't accept arrays or IEnumerable, so I can't use it for a list of Ids.

777

asked Oct 26 '11 01:10

Mike

2 Answers

UPDATE: With the addition of InExpression in EF6, the performance of processing Enumerable.Contains improved dramatically. The approach described in this answer is no longer necessary.

You are right that most of the time is spent processing the translation of the query. EF's provider model doesn't currently include an expression that represents an IN clause, therefore ADO.NET providers can't support IN natively. Instead, the implementation of Enumerable.Contains translates it to a tree of OR expressions, i.e. for something that in C# looks like like this:

Click to copy

new []{1, 2, 3, 4}.Contains(i)

... we will generate a DbExpression tree that could be represented like this:

Click to copy

((1 = @i) OR (2 = @i)) OR ((3 = @i) OR (4 = @i))

(The expression trees have to be balanced because if we had all the ORs over a single long spine there would be more chances that the expression visitor would hit a stack overflow (yes, we actually did hit that in our testing))

We later send a tree like this to the ADO.NET provider, which can have the ability to recognize this pattern and reduce it to the IN clause during SQL generation.

When we added support for Enumerable.Contains in EF4, we thought it was desirable to do it without having to introduce support for IN expressions in the provider model, and honestly, 10,000 is much more than the number of elements we anticipated customers would pass to Enumerable.Contains. That said, I understand that this is an annoyance and that the manipulation of expressions trees makes things too expensive in your particular scenario.

I discussed this with one of our developers and we believe that in the future we could change the implementation by adding first-class support for IN. I will make sure this is added to our backlog, but I cannot promise when it will make it given there are many other improvements we would like to make.

To the workarounds already suggested in the thread I would add the following:

Consider creating a method that balances the number of database roundtrips with the number of elements you pass to Contains. For instance, in my own testing I observed that computing and executing against a local instance of SQL Server the query with 100 elements takes 1/60 of a second. If you can write your query in such a way that executing 100 queries with 100 different sets of ids would give you equivalent result to the query with 10,000 elements, then you can get the results in aproximately 1.67 seconds instead of 18 seconds.

Different chunk sizes should work better depending on the query and the latency of the database connection. For certain queries, i.e. if the sequence passed has duplicates or if Enumerable.Contains is used in a nested condition you may obtain duplicate elements in the results.

Here is a code snippet (sorry if the code used to slice the input into chunks looks a little too complex. There are simpler ways to achieve the same thing, but I was trying to come up with a pattern that preserves streaming for the sequence and I couldn't find anything like it in LINQ, so I probably overdid that part :) ):

Usage:

Click to copy

var list = context.GetMainItems(ids).ToList();

Method for context or repository:

Click to copy

public partial class ContainsTestEntities {     public IEnumerable<Main> GetMainItems(IEnumerable<int> ids, int chunkSize = 100)     {         foreach (var chunk in ids.Chunk(chunkSize))         {             var q = this.MainItems.Where(a => chunk.Contains(a.Id));             foreach (var item in q)             {                 yield return item;             }         }     } }

Extension methods for slicing enumerable sequences:

Click to copy

public static class EnumerableSlicing {      private class Status     {         public bool EndOfSequence;     }      private static IEnumerable<T> TakeOnEnumerator<T>(IEnumerator<T> enumerator, int count,          Status status)     {         while (--count > 0 && (enumerator.MoveNext() || !(status.EndOfSequence = true)))         {             yield return enumerator.Current;         }     }      public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> items, int chunkSize)     {         if (chunkSize < 1)         {             throw new ArgumentException("Chunks should not be smaller than 1 element");         }         var status = new Status { EndOfSequence = false };         using (var enumerator = items.GetEnumerator())         {             while (!status.EndOfSequence)             {                 yield return TakeOnEnumerator(enumerator, chunkSize, status);             }         }     } }

Hope this helps!

answered Oct 12 '22 13:10

divega

If you find a performance problem which is blocking for you don't try to spend ages on solving it because you will most probably don't success and you will have to communicate it with MS directly (if you have premium support) and it takes ages.

Use workaround and workaround in case of performance issue and EF means direct SQL. There is nothing bad about it. Global idea that using EF = not using SQL anymore is a lie. You have SQL Server 2008 R2 so:

Create stored procedure accepting table valued parameter to pass your ids
Let your stored procedure return multiple result sets to emulate Include logic in optimal way
If you need some complex query building use dynamic SQL inside stored procedure
Use SqlDataReader to get results and construct your entities
Attach them to context and work with them as if they were loaded from EF

If the performance is critical for you you will not find better solution. This procedure cannot be mapped and executed by EF because current version doesn't support either table valued parameters or multiple result sets.

answered Oct 12 '22 12:10

Ladislav Mrnka

Related questions
                            
                                X509Certificate Constructor Exception
                            
                                How abstraction and encapsulation differ?
                            
                                How can I make SMTP authenticated in C#
                            
                                Getting "error": "unsupported_grant_type" when trying to get a JWT by calling an OWIN OAuth secured Web Api via Postman
                            
                                Reference type in C#
                            
                                Command for WPF TextBox that fires up when we hit Enter Key
                            
                                Convert string to int array using LINQ [duplicate]
                            
                                Cannot install windows service
                            
                                Inheriting XML comments from interfaces in C#
                            
                                Why does the C# compiler complain that "types may unify" when they derive from different base classes?
                            
                                Adding scripting functionality to .NET applications
                            
                                Looking for a Command Line Argument Parser for .NET [closed]
                            
                                WCF Service Client: The content type text/html; charset=utf-8 of the response message does not match the content type of the binding
                            
                                ThreadPool.QueueUserWorkItem vs Task.Factory.StartNew
                            
                                How to build a Url?
                            
                                Cleanest Way to Invoke Cross-Thread Events
                            
                                MEF with MVC 4 or 5 - Pluggable Architecture (2014)
                            
                                Retrieving Dictionary Value Best Practices
                            
                                C# Generics won't allow Delegate Type Constraints
                            
                                Huge performance difference (26x faster) when compiling for 32 and 64 bits

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the Contains() operator degrade Entity Framework's performance so dramatically?

Tags:

performance

c#

sql

contains

entity-framework-4

Mike

People also ask

2 Answers

divega

Ladislav Mrnka

Recent Activity

Donate For Us