Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow query caused by parameter variables, but why?

I have 3 versions of a query that ultimately returns the same results.

One of them gets dramatically slower when adding an extra inner join to a relatively small table AND where parameter variables are used within the where clause.

The execution plan is very different for the fast and slow queries (included below each query).

I want to understand why this happened and how to prevent it.

This query takes < 1 second. It does not have an extra inner join but it uses parameter variables in the where clause.

    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct sups.campaignid 
    from tblSupporterMainDetails sups
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Parallelism(Gather Streams)
       |--Sort(DISTINCT ORDER BY:([sups].[campaignID] ASC))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
                 |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
                      |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
                           |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1004]))
                           |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
                           |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
                           |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
                                |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1004],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))

This query takes < 1 second. It has an extra inner join BUT uses parameter constants in the where clause.

    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between '20120115' and '20120116'

  |--Parallelism(Gather Streams)
       |--Hash Match(Right Semi Join, HASH:([sups].[campaignID])=([camps].[campaignID]))
            |--Bitmap(HASH:([sups].[campaignID]), DEFINE:([Bitmap1007]))
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
            |         |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
            |              |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
            |                   |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1006]))
            |                   |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
            |                   |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
            |                   |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
            |                        |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1006],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([camps].[campaignID]))
                 |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]),  WHERE:(PROBE([Bitmap1007],[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID],N'[IN ROW]')))

This query takes 2 minutes. It has an extra inner join AND it uses parameter variables in the where clause.

    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
       |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
       |--Top(TOP EXPRESSION:((1)))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([calls].[callID], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterID_closingCall] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[AUTOGEN_PK_tblCallLogs] AS [calls]), SEEK:([calls].[callID]=[GOGEN].[dbo].[tblCallLogs].[callID] as [calls].[callID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) LOOKUP ORDERED FORWARD)

Notes:

  • I believe that the slowness is being caused by the Clustered Index Seek on tblCallLogs, however I don't know why SQL Server would choose this execution plan.
  • Should I be using a query optimiser hint? I have needed to and am reluctant to tell SQL Server how to do its job...
  • The problem seems to be caused by a combination of factors - an extra join AND variables.
  • Could the execution plan be trying to reuse a 'bad' plan when it finds variables for the query?
  • In real life I will have to use parameter variables. Constants are no good! So this problem could exist in many of my queries/stored procedures!
  • I have rebuilt indexes and updated statistics on tblCampaigns and tblSupporterMainDetails. This had no effect.
  • Both tables have clustered indexes on the primary key (identity integer).
  • The foreign key column campaignid is indexed.
  • All queries use the same parameter values - weather it be used as a variable or a constant.

Number of records in tables:

  • tblSupporterMainDetails = 12,561,900
  • tblCallLogs = 27,242,224
  • tblCampaigns = 756

UPDATE:

  • I have also rebuilt indexes and updated statistics on tblcalllogs. No effect.
  • I have cleared the execution plan cache using DBCC FREEPROCCACHE
  • tblCallLogs.callEnd is a datetime.

Schemas of involved columns:

tblCampaign.campaignid int not null
tblSupporterMainDetails.campaignid int not null
tblSupporterMainDetails.supporterid int not null
tblCallLogs.supporterid int not null
tblCallLogs.callEnd datetime not null

Indexes:

Indexes

UPDATE 2: After adding index to tblCallLogs.supporterId - with include column: callEnd
The 'slow' query speeded up to 40 seconds. the updated execution plan:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
   |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
   |--Top(TOP EXPRESSION:((1)))
        |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
             |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
             |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterid_callend] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) ORDERED FORWARD)

SOLUTION:

The extra join was not actually causing the problem directly, however it obviously changed the statement so that sql server held a different execution plan for it.
By adding OPTION(RECOMPILE) to the end of the slow statement, I was able get the expected fast performance. i.e. < 1 second. I am still unsure exactly this solution worked - why didnt flushing all plan work? is this a classic case of parameter sniffing? I will update this post as I learn the exact answer - or until someone can give a clear answer. Thanks to both @LievenKeersmaekers and @JNK for helping so far...

like image 924
make_transition Avatar asked Mar 28 '13 13:03

make_transition


People also ask

What are some potential reasons that the query is slow?

Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.

Are parameterized queries faster?

"parameterized queries typically execute much faster than a literal SQL string because they are parsed exactly once (rather than each time the SQL string is assigned to the CommandText property)."

Why is my database query slow?

Slow queries are frequently caused by combining two or more large tables together using a JOIN. Review the number of joins in your query, and determine if the query is pulling more information than is actually needed.


1 Answers

A summary of what lead to a solution:

Add a covering index on supporterid, callEnd.

The assumption here is that the optimizer can use this index (in contrast with callEnd, supporterid) to

  • first join tblSupporterMainDetailsand tblCallLogs
  • further use it in the where clause for selecting callEnd

Add the option OPTION(RECOMPILE)

all cudo's to TiborK and Hunchback for explaining the difference to the optimizer of using hard coded constants or variables.

Performance Impact - Constant value -vs- Variable

When you use the constant, the value is known to the optimizer so it can determine selectivity (and possible index usage) based on that. When you use a variable, the value is unknown to the optimizer (so it have to go by some hardwired value or possibly density info). So, technically, this isn't parameter sniffing, but whatever article you find on that subject should also explain the difference between a constant and a variable. Using OPTION(RECOMPILE) will actually turn the variabe to a parameter sniffing situation.

In essence, there is a big difference between a constant, a variable and a paramater (whcih can be sniffed).

like image 64
Lieven Keersmaekers Avatar answered Oct 27 '22 01:10

Lieven Keersmaekers