Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why does LINQ use the wrong datatype in my query while it's declared correctly in the EF schema?

My database schema uses varchar as default. With an EF(6) code first approach, I made sure my model is correct by setting the ColumnType for strings to varchar: modelBuilder.Properties<string>().Configure(p => p.HasColumnType("varchar"));

I'm using a PredicateBuilder to build my where clause and all works as expected; LINQ creates a parameterized query with the varchar datatype. I have also tried without the PredicateBuilder: the exact same issue arrises.

But once I add a Select statement, suddenly LINQ decides to change the datatype to nvarchar with no reason that I can think of. This of course has a seriously negative impact on my query, as sql server now has to do a bunch of implicit converts, rendering my indexes useless. It's now scanning the table instead of seeking.

var ciPredicate = PredicateBuilder.New<InfoEntity>(true);
ciPredicate = ciPredicate.And(x => x.InfoCode == ciCode);
ciPredicate = ciPredicate.And(x => x.Source == source);

//varchar - N'@p__linq__0 varchar(8000),@p__linq__1 varchar(8000)'
var ciQuery2 = this.Scope.Set<InfoEntity>().Where(ciPredicate).ToList();

//varchar - N'@p__linq__0 varchar(8000),@p__linq__1 varchar(8000)'
var ciQuery3 = this.Scope.Set<InfoEntity>().Where(ciPredicate).GroupBy(x => new { x.Source, x.InfoKey }).ToList();

//varchar - N'@p__linq__0 varchar(8000),@p__linq__1 varchar(8000)'
var ciQuery4 = this.Scope.Set<InfoEntity>().Where(ciPredicate).GroupBy(x => new { x.Source, x.InfoKey }).ToList().Select(group => group
                    .OrderByDescending(x => x.InfoSeqNr)
                    .FirstOrDefault()
                );

//nvarchar - N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)'
var ciQueryNvarchar = this.Scope.Set<InfoEntity>().Where(ciPredicate).GroupBy(x => new { x.Source, x.InfoKey })
                .Select(group => group
                    .OrderByDescending(x => x.InfoSeqNr)
                    .FirstOrDefault()
                ).ToList();

table definition:

CREATE TABLE Info(
  Id int NOT NULL,
  InfoKey int NOT NULL,
  Source varchar(50) NOT NULL,
  InfoCode varchar(50) NOT NULL,
  InfoDesc varchar(4000) NOT NULL,
  InfoSeqNr int NOT NULL
)

Since this is just the start of a query we can not use ciQuery4 with the ToList() in between.

I can't for the life of me figure out why this is happening, any help would be greatly appreciated.

like image 489
Andy Avatar asked Nov 07 '22 07:11

Andy


1 Answers

This seems like a bug in EF... Or is it really?

I was able to reproduce this with EF6.4.4 with the select statement on your GroupBy clause. Single column GroupBy's and composite column GroupBy's don't seem to make a difference.

But when I changed the Select-clause to something like this

var ciQueryNvarchar = this.Scope.Set<InfoEntity>().Where(ciPredicate).GroupBy(x => new { x.Source, x.InfoKey })
                .Select(group => new {
                    Group = group.Key,
                    MostRecentItem = group.OrderByDescending(x => x.InfoSeqNr).FirstOrDefault()
                }).ToList();

EF no longer uses nvarchar(4000) in the generated SQL statement. So this could be a workaround, but it ain't pretty...

A better alternative would be to rewrite the query altogether, because LINQ GroupBy and SQL GROUP BY are different things. LINQ's GroupBy is some sort of sort + bucketing operation, while SQL's GROUP BY also combines an aggregation function, which we don't have in this case.

Hence, the above query can be rewritten as

var ciQueryNvarchar = this.Scope.Set<InfoEntity>()
    .Where(ciPredicate)
    .Select(x => new { x.Source, x.InfoKey })
    .Distinct()
    .Select(x => this.Scope.Set<InfoEntity>()
        .Where(ciPredicate)
        .Where(y => y.Source == x.Source && y.InfoKey == x.InfoKey)
        .OrderByDescending(y => y.InfoSeqNr)
        .FirstOrDefault()
    )
    .ToList();

When comparing the query plans, both the initial query and the rewritten query have nearly identical generated SQL, but the interesting part is in the difference which could explain the type change to nvarchar.

enter image description here

In .NET, all strings are Unicode (nvarchar). So if the generated query contains SELECT @p__linq__0 AS [p__linq__0], @p__linq__1 AS [p__linq__1], EF's best guess would be to use nvarchar for this column to match .NET's unicode strings.

So it might not be a bug after all?

Yet another alternative could be resorting to plain SQL (which would also make for a much more readable query) but that might not be ideal either (in the light of rename refactorings and database schema changes).

like image 153
mstaessen Avatar answered Nov 12 '22 21:11

mstaessen