Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql server: Estimated number of rows is way off

I get a bizarre execution plan behavior from SQL Server (2005).

TableName: LOG
...contains around 1000 rows

  • ID int
  • Name varchar(50)

Query:

SELECT * 
  FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID DESC) as Row,
               ID, Name
          FROM Log) AS LogWithRowNumbers
 WHERE Row >= 1 
   AND Row <= 2

It estimates the number of rows returned as 9 (Although it's an obvious 2 or less).
Also, removing the "and Row<=2" will increase the execution time by about * 5. ("and Row<=2" and "and Row<=9999999999999" behave the same)

I've updated statistics. But still, this behavior is strange. Adding Row<99999999999 will make the query run faster? why ?

like image 861
Faruz Avatar asked Jan 05 '10 11:01

Faruz


2 Answers

I'm no expert on the inner workings/query optimisation process of SQL Server, but here's my 2 pence (or cents if you prefer).

I believe it's due to the use of the ROW_NUMBER() value within the WHERE clause. As an example, I created a sample table, populated with 1000 rows from ID 1 to 1000 (ID as a primary key) like you said.

If you take out the ROW_NUMBER(), and do the query based on the ID column like this:

Select * FROM
(
 SELECT ID, Name
 FROM Log
)
as LogWithRowNumbers
WHERE ID>=1 and ID<=2

Then it correctly shows row count as being 2 - as expected really.

Now, working backwards, add in the ROW_NUMBER to the inner SELECT but leave the WHERE clause as-is like below:

Select * FROM
(
 SELECT ROW_NUMBER() OVER (ORDER BY ID DESC) AS RowNo,
 ID, Name
 FROM Log
)
as LogWithRowNumbers
WHERE ID>=1 AND ID <=2

This still shows the correct row count as 2.

Finally, set the WHERE clause back to use the RowNo as the column being filtered on instead of ID, this is when the estimated row count jumps to 9.

Hence, I believe it's the use of the ROW_NUMBER() function, being filtered on in the WHERE clause that is the cause. So I'd imagine it's because there are obviously better/more accurate statistics available on actual table columns than there are on this function-produced value.

I hope this at least provides a good starting point, hopefully will be useful!

like image 166
AdaTheDev Avatar answered Sep 29 '22 20:09

AdaTheDev


AdaTheDev is correct. The reason you see this behaviour is because SQL Server has to work out the row numbers for the table before it can use them in the where clause.

This should to be a more efficient way of getting the same results:

 SELECT TOP(2) ROW_NUMBER() OVER (ORDER BY ID DESC) as Row,
 ID, Name
 FROM Log
 ORDER BY ID DESC
like image 25
AndrewWithey Avatar answered Sep 29 '22 20:09

AndrewWithey