Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are my SQL indexes being ignored?

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>) clause but would prefer not to have to do this.

As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.

Update:

I should have said that we've rebuilt all indexes and updated index statistics.

The table definition for one of the culprits is as follows:

CREATE TABLE [tblinvoices]
(
    [CustomerID] [int] NOT NULL,
    [InvoiceNo] [int] NOT NULL,
    [InvoiceDate] [smalldatetime] NOT NULL,
    [InvoiceTotal] [numeric](18, 2) NOT NULL,
    [AmountPaid] [numeric](18, 2) NULL 
        CONSTRAINT [DF_tblinvoices_AmountPaid]  DEFAULT (0),
    [DateEntered] [smalldatetime] NULL 
        CONSTRAINT [DF_tblinvoices_DateEntered]  DEFAULT (getdate()),
    [PaymentRef] [varchar](110),
    [PaymentType] [varchar](10),
    [SyncStatus] [int] NULL,
    [PeriodStart] [smalldatetime] NULL,
    [DateIssued] [smalldatetime] NULL 
        CONSTRAINT [DF_tblinvoices_dateissued]  DEFAULT (getdate()),
    CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED 
    (
        [InvoiceNo] ASC
    ) ON [PRIMARY]
) ON [PRIMARY]

There is one other index on this table (the one we want SQL to use):

CustomerID (Non-Unique, Non-Clustered)

The following query performs a table scan instead of using the CustomerID index:

SELECT 
    CustomerID, 
    Sum(InvoiceTotal) AS SumOfInvoiceTotal, 
    Sum(AmountPaid) AS SumOfAmountPaid 
FROM tblInvoices 
WHERE CustomerID = 2112 
GROUP BY customerID

Updated:

In answer to Autocracy's question, both of those queries perform a table scan.

Updated:

In answer to Quassnoi's question about DBCC SHOW_STATISTICS, the data is:

RANGE_HI_KEY    RANGE_ROWS    EQ_ROWS    DISTINCT_RANGE_ROWS    AVG_RANGE_ROWS
1667            246           454        8                      27.33333
2112            911           3427       16                     56.9375
2133            914           775        16                     57.125
like image 858
Kev Avatar asked Jun 09 '09 11:06

Kev


2 Answers

The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.

Explanation: Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.

By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".

(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...

Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!

like image 186
BradC Avatar answered Nov 15 '22 17:11

BradC


We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead.

Despite 4,302 days that have passed since Aug 29, 1997, SQL Server's optimizer has not evolved into SkyNet yet, and it still can make some incorrect decisions.

Index hints are just the way you, a human being, help the artificial intelligence.

If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.

They are legitimate, correct, documented and supported by Microsoft way to enforce the query plan you want.

In your case:

SELECT CustomerID, 
       SUM(InvoiceTotal) AS SumOfInvoiceTotal, 
       SUM(AmountPaid) AS SumOfAmountPaid 
FROM   tblInvoices 
WHERE  CustomerID = 2112 
GROUP BY
       CustomerID

, the optimizer has two choises:

  • Use the index which implies a nested loop over the index along with KEY LOOKUP to fetch the values of InvoiceTotal and AmountPaid
  • Do not use the index and scan all tables rows, which is faster in rows fetched per second, but longer in terms of total row count.

The first method may or may not be faster than the second one.

The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.

For selective indexes, the former method is faster; for non-selective ones, the latter is.

Could you please run this query:

SELECT  1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM    tlbInvoices

Update:

Since CustomerID = 2112 covers only 1,4% of your rows, you should benefit from using the index.

Now, could you please run the following query:

DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])

, locate two adjacents rows in the third resultset with RANGE_HI_KEY being less and more than 2112, and post the rows here?

Update 2:

Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.

Probably (probably) this is because this very value (2112) occurs in the RANGE_HI_KEY and the optimizer sees that it's unusually dense (3427 values for 2112 alone against only 911 for the whole range from 1668 to 2111)

Could you please do two more things:

  1. Run this query:

    DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
    

    and post the first two resultsets.

    • Run this query:

      SELECT TOP 1 CustomerID, COUNT(*) FROM tblinvoices WHERE CustomerID BETWEEN 1668 AND 2111

    , use the top CustomerID from the query above in your original query:

    SELECT CustomerID, 
           SUM(InvoiceTotal) AS SumOfInvoiceTotal, 
           SUM(AmountPaid) AS SumOfAmountPaid 
    FROM   tblInvoices 
    WHERE  CustomerID = @Top_Customer
    GROUP BY
           CustomerID
    

    and see what plan will it generate.

like image 32
Quassnoi Avatar answered Nov 15 '22 18:11

Quassnoi