Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Operator used tempdb to spill.... with variables but not literals

Tags:

sql

sql-server

Need help understanding this SQL Server behavior

I have a fairly basic query like

select x, y, sum(z) 
from table 
where date between @start and @end
group by x, y

There's a large number of rows (filter condition retrieves 6 million rows out of 16 million total)

The thing I don't understand is: this query is slow and I get a warning about spilling to tempdb. But if I change it and simply replace @start and @end with the same dates directly, it's much faster and there's no warning about tempdb spillage.

My guess is that the tempdb spill is because of cardinality estimates.

It appears that when I'm using variables, the statistics are way off. It's estimating about 1.45 million rows instead of 6 million.

When I use literals, the estimates are almost exactly correct.

How can I get correct estimates, and avoid tempdb spill, when using variables?

like image 361
wrschneider Avatar asked Oct 30 '22 21:10

wrschneider


1 Answers

The tempdb spill was because of estimates, which weren't right because I was using local variables.

Some stuff on why they were wrong because of the local variables:

SQL Server cardinality estimation can use two types of statistics to guess how many rows will get through a predicate filter:

  1. Statistics about the column on average using the density vector, and
  2. Statistics about particular values for that column using the histogram

If you're not familiar with statistics objects and their density vectors / histograms, read this.

When a literal is used, the cardinality estimator can search for that literal in the histogram (the second type of statistics). When a parameter is used, its value is not evaluated until after cardinality estimation, so the CE has to use column averages in the density vector (the first type of statistics).

In general, you'll get better estimations using literals because the statistics in the histogram are tailored to the value of the literal, rather than being averaged over the whole column.


Example

Case 1: Literal

I'm running the following query on the AdventureWorks2012_Data database:

SELECT *
FROM Sales.SalesOrderDetail
WHERE UnitPriceDiscount = 0

We've got a literal, so the CE will look for the value 0 in the UnitPriceDiscount histogram to figure out how many rows will be returned.

I've run debugging output to see which statistics object is being used and queried that object to see its contents, and here's the histogram:

UnitPriceDiscount statistics object histogram

The value 0 is a RANGE_HI_KEY, so the estimated number of rows with that value is its EQ_ROWS column - in this case 117996.9.

Now let's look at the execution plan for the query:

Execution plan with literal

The 'Filter' step is getting rid of all rows that don't match the predicate, so the 'Estimated Number of Rows' section of its properties has the result of the cardinality estimation:

enter image description here

That's the value we saw in the histogram, rounded.

Case 2: Parameter

Now we'll try with a parameter:

DECLARE @temp int = 0

SELECT *
FROM Sales.SalesOrderDetail
WHERE UnitPriceDiscount = @temp

The cardinality estimator doesn't have a literal to search for in the histogram, so it has to use the overall density of the column from the density vector:

enter image description here

This number is:

1 / the number of distinct values in the UnitPriceDiscount column

So if you multiply that by the number of rows in the table, you get the average number of rows per value in this column. There are 121317 rows in Sales.SalesOrderDetail, so the calculation is:

121317 * 0.1111111 = 13479.6653187

The execution plan:

Execution plan with parameter

The properties of the filter:

Filter properties with parameter

So the new estimation is coming from the density vector, not the histogram.

Let me know if you have a look at the stats object and it doesn't add up as above.

like image 193
alksdjg Avatar answered Nov 15 '22 05:11

alksdjg