Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EF6 SQL generation for <where nullable columns equals>

Trying to upgrade from EF5 to EF6, I encounter significant performance gap searching table by nullable column. Here is a sample:

public class Customer
{
    public int Id { get; set; }
    public int? ManagerId { get; set; }
    //public virtual Manager Manager { get; set; }
}

public class MyContext : DbContext
{
    public MyContext(string connstring): base(connstring){}
    public DbSet<Customer> Customers { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var db = new MyContext("CONNSTRING");
        var managerId = 1234;
        var q = from b in db.Customers
                where b.ManagerId == managerId
                select b.Id;
        var s = q.ToString();
    }
}

When EF6 generates SQL, it adds some logic for null handling:

SELECT 
[Extent1].[Id] AS [Id]
FROM [dbo].[Customers] AS [Extent1]
WHERE (([Extent1].[ManagerId] = @p__linq__0) 
AND ( NOT ([Extent1].[ManagerId] IS NULL OR @p__linq__0 IS NULL))) 
OR (([Extent1].[ManagerId] IS NULL) AND (@p__linq__0 IS NULL))

Notice same linq produced more simple SQL under EF5:

SELECT 
[Extent1].[Id] AS [Id]
FROM [dbo].[Customers] AS [Extent1]
WHERE [Extent1].[ManagerId] = @p__linq__0

I can understand the point developers tried to achieve: if you supply null as parameter, the query where managerId = null will not select any lines. I appreciate the care, but 99.9% times search logic is separated: one usecase looks for where ManagerId == null, another searches for specific id where ManagerId == managerId

The problem is in big performance impact: MS SQL does not use index on ManagerId and table scan occurs. My project has hundreds of similar searches and on database size about 100GB overall performance after upgrade to EF6 cut by 10 approximately.

The question is does anybody know some kind of configuration or convention to disable this roadblock in EF6 and generate simple sql?

EDIT:

I checked a dozen similar selects in my project and found that:

  • In some cases SQL SERVER does use the index specified for the field I search. And even in this cases there is slight performance loss: it uses index twice: first time looking for value I specified in parameter, second time looking for null
  • EF6 even checks for null when constant is exactly specified as not null, for example:

                from p in db.PtnActivations
            where p.Carrier != "ALLTEL"
            where p.Carrier != "ATT"
            where p.Carrier != "VERIZON"
    

generates SQL

    WHERE ( NOT (('ALLTEL' = [Extent1].[Carrier]) AND ([Extent1].[Carrier] IS NOT NULL))) AND ( NOT (('ATT' = [Extent1].[Carrier]) AND ([Extent1].[Carrier] IS NOT NULL))) AND ( NOT (('VERIZON' = [Extent1].[Carrier]) AND ([Extent1].[Carrier] IS NOT NULL)))

that did not utilize my index on carrier. EF5 version had

( NOT (('ALLTEL' = [Extent1].[Carrier]))) AND ( NOT (('ATT' = [Extent1].[Carrier]))) AND ( NOT (('VERIZON' = [Extent1].[Carrier]) ))

that utilized it.

Notice the condition ('ALLTEL' = [Extent1].[Carrier]) AND ([Extent1].[Carrier] IS NOT NULL). Second part is always false, but adding this part gives up the index.

My routine import of about 1.7M records (that usually had taken about 30 minutes) is up for 3 hours and progress is about 30%.

like image 916
Ben Avatar asked Oct 30 '13 02:10

Ben


2 Answers

Set

db.Configuration.UseDatabaseNullSemantics = true;

to get the behavior you had in EF5. This workitem describes what the difference between true and false is and should help you decide whether you are OK with the old behavior or not.

like image 69
Pawel Avatar answered Oct 22 '22 14:10

Pawel


Very different answer.

If you are using a varchar(xxx) the LNQ to SQL spits out nvarchar(4000), which breaks indexes and conversions massively blowing out your sql plan. In my case I found this question due to the odd null behavior, but that was not the issue. Answer below solves both null and nvarchar issue. SQL plan went from ~11 to .006.

public class InterestingRow
{
    [Key]
    public int interesting_row_id { get; set; }

    [StringLength(255), Required, Column(TypeName = "varchar")]
    public string public_guid { get; set; }
}

(Yes there are many reasons to use varchar, like you are storing a publicly exposed guid)

like image 28
Aaron Sherman Avatar answered Oct 22 '22 13:10

Aaron Sherman