In a heavily multi-threaded scenario, I have problems with a particular EF query. It's generally cheap and fast:
Context.MyEntity
.Any(se => se.SameEntity.Field == someValue
&& se.AnotherEntity.Field == anotherValue
&& se.SimpleField == simpleValue
// few more simple predicates with fields on the main entity
);
This compiles into a very reasonable SQL query:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM (SELECT [Extent1].[Field1] AS [Field1]
FROM [dbo].[MyEntity] AS [Extent1]
INNER JOIN [dbo].[SameEntity] AS [Extent2] ON [Extent1].[SameEntity_Id] = [Extent2].[Id]
WHERE (N'123' = [Extent2].[SimpleField]) AND (123 = [Extent1].[AnotherEntity_Id]) AND -- further simple predicates here -- ) AS [Filter1]
INNER JOIN [dbo].[AnotherEntity] AS [Extent3] ON [Filter1].[AnotherEntity_Id1] = [Extent3].[Id]
WHERE N'123' = [Extent3].[SimpleField]
)) THEN cast(1 as bit) ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
The query, in general, has optimal query plan, uses the right indices and returns in tens of milliseconds which is completely acceptable.
However, when a critical number of threads (<=40) starts executing this query, the performance on it drops to tens of seconds.
There are no locks in the database, no queries are writing data to these tables and it reproduces very well with a database that's practically isolated from any other operations. The DB resides on the same physical machine and the machine is not overloaded at any point, i.e. has plenty of spare CPU, memory and other resources the CPU is overloaded by this operation.
Now what's really bizarre is that when I replace the EF Any()
call with Context.Database.ExecuteSqlCommand()
with the copy-pasted SQL (also using parameters), the problem magically disappears. Again, this reproduces very reliably - replacing the Any()
call with copy-pasted SQL increases the performance by 2-3 orders of magnitude .
An attached profiler (dotTrace) sampling shows that the threads seem to all spend their time in the following method:
Is there anything I've missed or did we hit some ADO.NET / SQL Server cornercase?
MORE CONTEXT
The code running this query is a Hangfire job. For the purpose of test, a script queues a lot of jobs to be performed and up to 40 threads keep processing the job. Each job uses a separate DbContext
instance and it's not really being used a lot. There are a few more queries before and after the problematic query and they take expected times to execute.
We're using many different Hangfire jobs for similar purposes and they behave as expected. Same with this one, except when it gets slow under high concurrency (of exact same jobs). Also, just switching to SQL on this particular query fixes the problem.
The profiling snapshot above is representative, all the threads slow down on this particular method call and spend the vast majority of their time on it.
UPDATE
I'm currently re-running a lot of those checks for sanity and errors. The easy reproduction means it's still on a remote machine to which I can't connect using VS for debugging.
One of the checks showed that my previous statement about free CPU was false, the CPU was not entirely overloaded but multiple cores were in fact running on full capacity for the whole duration of the long running jobs.
Re-checking everything again and will come back with updates here.
Answer. Entity Framework loads very slowly the first time because the first query EF compiles the model. If you are using EF 6.2, you can use a Model Cache which loads a prebuilt edmx when using code first; instead, EF generates it on startup.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
SQL Server 2017 is the fastest database everywhere you need it. Whether it is your laptop, in your private cloud, or in our Azure public cloud infrastructure. Whether it is running on Linux, Windows, or Docker Containers, we have the speed to power any workload your application needs.
Can you try as shown below and see whether is there any performance improvement or not ...
Context.MyEntity.AsNoTracking()
.Any(se => se.SameEntity.Field == someValue
&& se.AnotherEntity.Field == anotherValue
&& se.SimpleField == simpleValue
);
Check if you are reusing the context in a loop. Doing so may create many objects during your performance test and giving the garbage collector a lot of work to do.
Faulty initial assumptions. The SQL in the question was obtained by pasting the code into LINQPad and having it generate the SQL.
After attaching an SQL profiler to the actual DB used, it showed a slightly different SQL involving outer joins, which are suboptimal and didn't have a proper index in place.
It remains a mystery why LINQPad generated different SQL, even though it's using the same EntityFramework.dll
, but the original problem is resolved and all that remains is to optimize the query.
Many thanks for everyone involved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With