Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EF 4.1 Code-first executes queries 3x slower than regular EF in my application

I have a pet project (a simple forum application) that I use to test out all the latest .NET tech and I recently got around to toying with Entity Framework Code-First. This app already had an existing EF solution with an EDMX file mapped to an existing database and all my entities were auto-generated. This solution has worked great so far.

Note: Keep in mind that this change to EF 4.1 is purely for learning. If you are wondering what my needs were that caused me to upgrade, there weren't any. I simply wanted to do it for fun.

I copied the project and did the upgrades so I would have the same project but with different Entity Framework implementations. In the new project I used a Visual Studio extension called Entity Framework Power Tools to generate POCOs and a DbContext from my existing database. Everything worked flawlessly. I had the app compiling in about 30 minutes time. Pretty impressive.

However, I noticed now when running the app that the query execution is approximately 3 times slower than it was before. Any idea what I could have missed?

Below are the details for both solutions, as well as LINQPad measurements for both. (click images for full size)

EF 4.0 Details

Here is a snapshot of my EF 4.0 data model. It cuts off a few entities on top and bottom but you get the idea.

http://www.codetunnel.com/content/images/EF41question/1.jpg Here is a LINQPad test against my EF 4.0 data model.

http://www.codetunnel.com/content/images/EF41question/2.jpg Notice that the query took 2.743 seconds to execute.

EF 4.1 Details

Here is a snapshot of my EF 4.1 data model. Since it's code-only I will show the DbContext class as well as one of the mapping classes (fluent API code) for one entity, and one entity itself.

DbContext http://www.codetunnel.com/content/images/EF41question/3.jpg TopicMap (fluent API configuration) http://www.codetunnel.com/content/images/EF41question/4.jpg Topic (POCO entity) http://www.codetunnel.com/content/images/EF41question/5.jpg Here is a LINQPad test against my EF 4.1 model.

http://www.codetunnel.com/content/images/EF41question/6.jpg Notice this time that the query took 6.287 seconds to execute and it was the exact same query. It takes over 30 seconds the very first time it is run. If I go to the SQL and IL tabs in LINQPad the generated SQL and the IL code are identical for both data models. This is really giving me grief. In the actual application things are so slow with EF 4.1 that it is unusable.

I ran the same LINQ query against both models. The query grabs all topics for a regular forum user, orders them in descending order by their last reply date (or the topic post date if there are no replies).

Obviously I can just go back to EF 4.0 and go about my merry way but I'm really interested if there might be something I missed.

like image 578
Chev Avatar asked Sep 29 '11 23:09

Chev


2 Answers

UPDATE

I'm completely revisiting this answer because of some recent developments.

Because of some inquiry by the Entity Framework team at Microsoft trying to duplicate my issue, I went back and retraced my steps to better help narrow down the problem. It's been a while since I asked this question and I understand things much better now than I did then.

Rather than go back and try to get some very old code running I decided to start from scratch with a simple test project. I put together a simple database with two tables and mapped them to an EF 4.0 designer file.

This generated a connection string like this:

<add name="EFTestEntities" connectionString="metadata=res://*/Entities.csdl|res://*/Entities.ssdl|res://*/Entities.msl;provider=System.Data.SqlClient;provider connection string=&quot;data source=.\sqlexpress;initial catalog=EFTest;integrated security=True;multipleactiveresultsets=True;App=EntityFramework&quot;" providerName="System.Data.EntityClient" />

I then populated the database with 1000 rows of test topics and 10 rows of replies for each topic. Once I had this working I timed a very basic query fairly similar to the one in my main question. Then I duplicated the test project and I modified it using the Entity Framework Power Tools extension to generate my model objects and DbContext. The only thing I modified was the connection string to remove the metadata that is referenced when there is a designer file in the project so it looked like this:

<add name="EFTestContext" providerName="System.Data.SqlClient" connectionString="Data Source=.\sqlexpress;Initial Catalog=EFTest;Integrated Security=True;Pooling=False" />

I then ran the exact same query as I did with the designer.

There was no difference in query times except for the slightly extra time it takes for the code-first to generate the mapping meta-data. After that initial query the two versions of EF performed pretty much the same. I was about to resolve the problem as not reproducible but then I notice I did something horrible in the question. I called .AsEnumerable() before my queries. If you don't already know what that does, that will cause the ENTIRE entity collection to be pulled into memory and then the query would be applied there as LINQ-to-Objects rather than LINQ-to-Entities.

This means that I was sucking an entire table into memory and then doing LINQ against it there. In cases where SQL server is on the same machine as your website you might not notice the difference but there are many cases where this would be a huge issue. In my case it really was causing a performance loss.

I went back to my tests and I ran them with .AsEnumerable() placed before the queries.

Now I expected the time to be slower since my LINQ queries weren't being translated into expression trees and executed in the database. However, it seems I did reproduce the issue in my question. The code-only version is returning much slower. This is actually pretty strange because they both should be running the same. I am not surprised that they are running slower than when the queries were against IQueryable, but now that they are running against IEnumerable there is a big difference between the two. I was able to expand the difference between the two by adding more and more data to the table.

I went ahead and added 5000 more topics to the database, with 30 replies for each topic. So there is now a total of 6000 topic rows and 165000 reply rows. First I ran the query with proper LINQ-to-Entities:

As you can see, still no difference. Then I ran the queries against with LINQ-to-Objects using .AsEnumerable().

I stopped it after three queries because waiting about two minutes per query was excruciating. I can't seem to produce the 3x as slow issue that I show in my question, but code-only is significantly slower. The EDMX approach takes just shy of two minutes to complete one query while the code-only approach consistently takes over two minutes.

like image 52
7 revs, 2 users 82% Avatar answered Nov 15 '22 14:11

7 revs, 2 users 82%


If I don't misunderstand you have two situations and in both situations you compare the performance of a query:

  1. In your question you compare the performance of a query with the ObjectContext API in EF 4.0 with the performance of the same query with DbContext API using Code-First in EF 4.1. Now if I saw right in your code repositories for the EF 4.0 approach you have used EntityObject derived entities and not POCOs (generated from the T4 POCO generator for EF 4.0) whereas in your EF 4.1 solution you have POCOs.

    My hypothesis is that this makes the difference. If you are using EntityObject derived entities your objects are capable to track their own changes. If you use POCOs on the other hand EF will create property snapshots of every materialized entity in the context which is not necessary for EntityObjects. This snapshot creation might take a serious amount of time.

    So this is not really a comparison between EF 4.0 and EF 4.1 but between a POCO and non-POCO approach. (Working with POCOs (which are not prepared for change tracking proxies, i.e. every property is virtual) in Entity Framework is slower in every aspect. If performance matters you must avoid them.)

  2. Now, what you mention in the UPDATE to your own answer is interesting. If I understand right you compare here EF 4.1 with DbContext using Code-First (OnModelCreating has entity configurations and the connection string doesn't have references to the EDMX file) and EF 4.1 with DbContext using Database-First (OnModelCreating is empty (or only throws this UnintentionalCodeFirstException from the DbContext generator) and the connection string has a reference to the EDMX file).

    In both cases you are using the same POCO entities. Here the performance difference is surprising to me and I don't have an immediate hypothesis why this happens. You didn't provide exact measurements though (as in the situation in your question) which would be interesting.

    I'm not surprised that the first query is faster with the EDMX approach because EF only needs to read and process an XML file to build the model in memory whereas without EDMX it must reflect over the code in the assembly which is probably slower. But once the model is built in memory (so starting with the second query latest) my understanding was that EF will only work with that in-memory representation of the model, no matter what's the source of the metadata. If you really have a big performance difference in that situation I am stumped.

Edit

To summarize the comments: We are talking about point 2 above which means that you compare EF 4.1 Database-First (EDMX specified in connection string, OnModelCreating is empty) with EF 4.1 Code-First (EDMX not specified in connection string, OnModelCreating contains model configuration). Result:

Your query with EF 4.1 Database-First is three times faster than with EF 4.1 Code-First.

For a simple model I could not reproduce the behaviour. I have created one with Code-First and tested the query time with dummy data (300000 users ("Spock 1" to "Spock 300000") in User table, Contains("pock")-query over user name, so that all 300000 users must be returned and materialized, needs 3.2 sec). Then I have generated an EDMX from the code-first model:

using (var context = new MyEntities())
{
    using (var writer = new XmlTextWriter("model.edmx", Encoding.Default))
    {
        EdmxWriter.WriteEdmx(context, writer);
    }
}

The resulting EDMX file I've added to the project, modified the connection string to include the EDMX metadata, and run the query again. The query time was almost exactly the same (3.2 sec).

So, the reason for the problem is apparently not so obvious. It could have to do with the complexity of the model or the query though.

like image 35
Slauma Avatar answered Nov 15 '22 14:11

Slauma