I'm pulling some data from SQL using Entity Framework.
I've written some code that looks like the snippet below. Note that Something
is a varchar
value from the database. Furthermore, I think it may be relevant that every value in Something
consists of ten digits, a hyphen, and then two more digits, such as "0123456789-01". (Don't remind me this is bad design; I'm not the architect of this database.)
var X = Entities.MyView
.Select(x => x.Something)
.Distinct();
// 5850 elements
var Y = Entities.MyView
.GroupBy(x => x.Something);
// 5850 elements
var Z = Entities.MyView
.ToArray()
.GroupBy(x => x.Something);
// 5727 elements
// Added in response to user995219's Answer
var ZZ = Entities.MyView
.GroupBy(x => x.Something)
.ToArray();
// 5850 elements
The first statement pulls down the distinct values of the Something
column from the view MyView
. It gets me the result 5850 elements, which is what I expect.
The second statement pulls down the whole view and groups it by unique values of Something
, yielding 5850 groupings, which is what I expect.
The third statement is exactly the same, except it calls ToArray
on the values from MyView
. Normally, I would do this when I expect the calling code to use the entirety of the data and don't want to deal with any Lazy Loading SNAFUs. However, when I run this code in the debugger, this third statement yields 5727 groupings, 123 less than I expect.
Edit:The fourth statement simply reverses the order of the GroupBy
and ToArray
method chaining. Making this small change, I get the number of rows that I expect, but the real problem that this question cannot be solved this way because the real problem has code that associates data from this SQL call with other data and returns them in a single object.
I wish to understand how it is that using ToArray
causes me to get the wrong number of results.
Follow-up: In response to user995219's answer, I rewrote MyView
so that it has an additional column: SomethingInt
, which simply containins a 32-bit integer formed by omitting the hyphen from Something
and considering the result as a single integer. Then, the LINQ code does the GroupBy
on the newly minted integer.
However, even with this change, I'm still having the same problem.
var A = Entities.MyView
.ToArray();
// Returns 17893 elements, the same as if I ran SELECT * FROM MyView
var array0 = A.Select(x => x.SomethingInt).Distinct();
// Returns 5727 elements when I expect 5850
In SQL, SELECT COUNT(DISTINCT(SomethingInt)) FROM MyView;
returns 5850, as I expect.
This should take any kind of string-comparison in the .net framework out of the picture, but the problem persists.
The ToArray statement executes the query and the group by is performed in memory using string equality comparer. The first two statements perform the group by at SQL level. There are differences: String Comparison differences between .NET and T-SQL?
Particularly concerning collation.
I found the answer. user995219's answer was useful, but not the full explanation.
Apparently, LINQ methods check the contents of what they are operating on. In my case, I'm using classes generated by Entity Framework. These have "Entity Keys", which allow the .net framework to distinguish between two rows that have the same contents and two instances of the same row.
In my case, I was using a complicated view and the .net framework inferred the entity keys incorrectly and then discarded rows because it thought they were the same.
The solution for me was to modify my view so that there is a GUID that uniquely identifies each row and use the GUIDs as an entity key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With