Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework returning bad data

I have an Entity Framework 6.1 project that is querying a SQL Server 2012 database table and getting back incorrect results.

To illustrate what is happening, I created 2 queries that should have the exact same results. The table ProjectTable has 23 columns and 20500ish rows:

var test1 = db.ProjectTable
    .GroupBy(t => t.ProjectOwner)
    .Select(g => g.Key)
    .ToArray();

var test2 = db.ProjectTable
    .ToArray()
    .GroupBy(t => t.ProjectOwner)
    .Select(g => g.Key)
    .ToArray();

The queries are designed to get a list of all of the distinct project owners in the table. The first query does the heavy lifting on the SQL Server, where as the second query downloads the entire table into memory and then processes it on the client side.

The first variable test1 has a length of about 300 items. The second variable test2 has a length of 5.

Here are the raw SQL queries that EF is generating:

-- test1
SELECT [Distinct1].[ProjectOwner] AS [ProjectOwner]
FROM ( SELECT DISTINCT
       [Extent1].[ProjectOwner] AS [ProjectOwner]
       FROM [dbo].[ProjectTable] as [Extent1]
     ) AS [Distinct1]

-- test2
SELECT Col1, Col2 ... ProjectOwner, ... Col23
FROM [dbo].[ProjectTable]

When I run this query and analyze the returned entities, I notice that the full 20500ish rows are returned, but the ProjectOwner column gets overridden with one of only 5 different users!

var test = db.ProjectTable.ToArray();

I thought that maybe it was the SQL Server, so I did a packet trace and filtered on TDS. Randomly looking through the raw streams I see many names that aren't in the list of 5, so I know that data is being sent across the wire correctly.

How do I see the raw data that EF is getting? Is there something that might be messing with the cache and pulling incorrect results?

If I run the queries in either SSMS or Visual Studio, the list returned is correctly. It is only EF that has this issue.

EDIT

Ok, I added another test to make sure my sanity is in check. I took the test2 raw sql query and did the following:

var test3 = db.Database
    .SqlQuery<ProjectTable>(@"SELECT Col1..Col23")
    .ToArray()
    .Select(t => t.ProjectOwner)
    .Distict()
    .ToArray();

and I get the correct 300ish names back!

So, in short:

  1. Having EF send projected DISTINCT query to SQL Server returns the correct results
  2. Having EF select the entire table and then using LINQ to project and DISTINCT the data returns incorrect results
  3. Giving EF THE EXACT SAME QUERY!!! that bullet #2 generates and doing a raw SQL query, returns the correct results
like image 247
joe_coolish Avatar asked Feb 24 '15 16:02

joe_coolish


1 Answers

After downloading the Entity Framework source and stepping through many an Enumerator, I found the issue.

In the Shaper.HandleEntityAppendOnlymethod (found here), on line 187 the Context.ObjectStateManager.FindEntityEntry method is called. To my surprise, a non-null value was returned! Wait a minute, there shouldn't be any cached results, since I'm returning all rows?!

That's when I discovered that my Table has no Primary Key!

In my defence, the table is actually a cache of a view that I'm working with, I just did a SELECT * INTO CACHETABLE FROM USERVIEW

I then looked at which column Entity Framework thought was my Primary Key (they call it a singleton key) and it just so happens that the column they picked had only... drum roll please... 5 unique values!

When I looked at the model that EF generated, sure enough! That column was specified as a primary key. I changed the key to the appropriate column and now everything is working as it should!

like image 112
joe_coolish Avatar answered Sep 29 '22 12:09

joe_coolish