We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>)
clause but would prefer not to have to do this.
As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.
Update:
I should have said that we've rebuilt all indexes and updated index statistics.
The table definition for one of the culprits is as follows:
CREATE TABLE [tblinvoices]
(
[CustomerID] [int] NOT NULL,
[InvoiceNo] [int] NOT NULL,
[InvoiceDate] [smalldatetime] NOT NULL,
[InvoiceTotal] [numeric](18, 2) NOT NULL,
[AmountPaid] [numeric](18, 2) NULL
CONSTRAINT [DF_tblinvoices_AmountPaid] DEFAULT (0),
[DateEntered] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_DateEntered] DEFAULT (getdate()),
[PaymentRef] [varchar](110),
[PaymentType] [varchar](10),
[SyncStatus] [int] NULL,
[PeriodStart] [smalldatetime] NULL,
[DateIssued] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_dateissued] DEFAULT (getdate()),
CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED
(
[InvoiceNo] ASC
) ON [PRIMARY]
) ON [PRIMARY]
There is one other index on this table (the one we want SQL to use):
CustomerID (Non-Unique, Non-Clustered)
The following query performs a table scan instead of using the CustomerID
index:
SELECT
CustomerID,
Sum(InvoiceTotal) AS SumOfInvoiceTotal,
Sum(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY customerID
Updated:
In answer to Autocracy's question, both of those queries perform a table scan.
Updated:
In answer to Quassnoi's question about DBCC SHOW_STATISTICS
, the data is:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
1667 246 454 8 27.33333
2112 911 3427 16 56.9375
2133 914 775 16 57.125
The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.
Explanation: Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.
By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".
(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...
Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!
We're having a problem where indexes on our tables are being ignored and
SQL Server 2000
is performing table scans instead.
Despite 4,302
days that have passed since Aug 29, 1997
, SQL Server
's optimizer has not evolved into SkyNet
yet, and it still can make some incorrect decisions.
Index hints are just the way you, a human being, help the artificial intelligence.
If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.
They are legitimate, correct, documented and supported by Microsoft
way to enforce the query plan you want.
In your case:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY
CustomerID
, the optimizer has two choises:
KEY LOOKUP
to fetch the values of InvoiceTotal
and AmountPaid
rows fetched per second
, but longer in terms of total row count.The first method may or may not be faster than the second one.
The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.
For selective indexes, the former method is faster; for non-selective ones, the latter is.
Could you please run this query:
SELECT 1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM tlbInvoices
Update:
Since CustomerID = 2112
covers only 1,4%
of your rows, you should benefit from using the index.
Now, could you please run the following query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
, locate two adjacents rows in the third resultset with RANGE_HI_KEY
being less and more than 2112
, and post the rows here?
Update 2:
Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.
Probably (probably) this is because this very value (2112
) occurs in the RANGE_HI_KEY
and the optimizer sees that it's unusually dense (3427
values for 2112
alone against only 911
for the whole range from 1668
to 2111
)
Could you please do two more things:
Run this query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
and post the first two resultsets.
Run this query:
SELECT TOP 1 CustomerID, COUNT(*) FROM tblinvoices WHERE CustomerID BETWEEN 1668 AND 2111
, use the top CustomerID
from the query above in your original query:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = @Top_Customer
GROUP BY
CustomerID
and see what plan will it generate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With