Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server non-clustered index design

This question concerns designing non-clustered indexes in SQL Server 2005.

I have a large table with several million lines. Rows are only ever read or inserted. Most operations are reads. I have been looking at the various SELECT queries that access the table with the objective of improving read access speed. Disk space isn't really an issue. (Each row has a unique ID, and I am using that as the single field in the clustered index.)

My question is, if a non-clustered index indexes more columns than are used by a query, does that translate into slower query execution than an index that exactly matches the query?

As the number of distinct queries increases, so does the number of permutations of columns used in their WHERE clauses. I'm unsure about the trade-offs between having many indexes with a small number of columns (one for each query) versus fewer indexes on more columns.

For example, say I have two SELECT queries. The first uses columns A, B, C, and D in its WHERE clause, and the second uses A, B, E, and F. Would best practice here be to define two indexes, one on A/B/C/D and the other on A/B/E/F; or a single index on A/B/C/D/E/F?

like image 551
Andy Johnson Avatar asked Jul 18 '11 17:07

Andy Johnson


3 Answers

First things first, the order of columns in the indexes matter. So building/tuning your queries accordingly will allow you to make good use of indexes you built.

Whether having two indexes separately or one index depends on the dependencies of columns in contention and the kind of queries that are run. In your example if E and F columns relate to or depend on C and D columns then it makes sense to have one index covering all columns.

like image 81
Santosh Chandavaram Avatar answered Oct 10 '22 09:10

Santosh Chandavaram


My question is, if a non-clustered index indexes more columns than are used by a query, does that translate into slower query execution than an index that exactly matches the query?

No, having more columns doesn't slow down the query time for queries that are using the first 1, 2, n columns in the index. That being said, if you are limited on memory the index loading into memory may push other things out of memory and slow down the query, but if you have plenty of memory this shouldn't be a problem.

As the number of distinct queries increases, so does the number of permutations of columns used in their WHERE clauses. I'm unsure about the trade-offs between having many indexes with a small number of columns (one for each query) versus fewer indexes on more columns.

You should add the most commonly queried unique fields into the indexes first. Fewer indexes with many columns may not give you what you want.

for instance if you have an index with the following columns:

  • ColumnA
  • ColumnB
  • ColumnC
  • ColumnD
  • ColumnE
  • ColumnF

in that order, queries filtering against ColumnA, ColumnB, ColumnC, ColumnD... will use the index, but if you are just querying against ColumnE or ColumnF it won't use the index.

Take a diffferent approach if you have six indexes on a single table each with just one column

  • Index1 - ColumnA
  • Index2 - ColumnB
  • Index3 - ColumnC
  • Index4 - ColumnD
  • Index5 - ColumnE
  • Index6 - ColumnF

in this case only one of those 6 indexes will get used for any query.

Also if you index contains a value that is not very selective, then it may not be helping you. For instance if you have a column called GENDER that may contain the following values (Male, Female, and Unknown) then it is probably not going to help you to include this column in the index. When the query is run SQL Server may determine that they column is not selective enough and just assume that a full table scan would be faster.

There are many ways to find out what indexes are being used by your query, but one approach that I use is to look at the indexes that are never used. Run the following query in your database and find out if the indexes that you think are being used are really being used.

SELECT iv.table_name, 
        i.name                           AS index_name, 
        iv.seeks + iv.scans + iv.lookups AS total_accesses, 
        iv.seeks, 
        iv.scans, 
        iv.lookups, 
        t.indextype, 
        t.indexsizemb 
FROM   (SELECT i.object_id, 
                Object_name(i.object_id) AS table_name, 
                i.index_id, 
                SUM(i.user_seeks)        AS seeks, 
                SUM(i.user_scans)        AS scans, 
                SUM(i.user_lookups)      AS lookups 
        FROM   sys.tables t 
                INNER JOIN sys.dm_db_index_usage_stats i 
                    ON t.object_id = i.object_id 
        GROUP  BY i.object_id, 
                    i.index_id) AS iv 
        INNER JOIN sys.indexes i 
            ON iv.object_id = i.object_id 
            AND iv.index_id = i.index_id 
        INNER JOIN (SELECT sys_schemas.name AS schemaname, 
                            sys_objects.name AS tablename, 
                            sys_indexes.name AS indexname , 
                            sys_indexes.type_desc AS indextype , 
    CAST(partition_stats.used_page_count * 8 / 1024.00 AS DECIMAL(10, 3)) AS indexsizemb 
FROM   sys.dm_db_partition_stats partition_stats 
INNER JOIN sys.indexes sys_indexes 
    ON partition_stats.[object_id] = sys_indexes.[object_id] 
        AND partition_stats.index_id = sys_indexes.index_id 
        AND sys_indexes.type_desc <> 'HEAP' 
INNER JOIN sys.objects sys_objects 
    ON sys_objects.[object_id] = partition_stats.[object_id] 
INNER JOIN sys.schemas sys_schemas 
    ON sys_objects.[schema_id] = sys_schemas.[schema_id] 
        AND sys_schemas.name <> 'SYS') AS t 
ON t.indexname = i.name 
AND t.tablename = iv.table_name 
--WHERE t.IndexSizeMB > 200 
WHERE  iv.seeks + iv.scans + iv.lookups = 0 
ORDER  BY total_accesses ASC; 

I generally track down indexes that have never been used, or have not been used several months after a SQL Server reboot, and determine if they should be deleted or not. Sometimes too many indexes can slow down SQL Server figuring out the best path to run a query, and deleting unused indexes can speed up that process.

I hope this helps make sense out of your indexes.

like image 42
Steve Stedman Avatar answered Oct 10 '22 09:10

Steve Stedman


The existing answers are already very good. Here is a new thought: Finding an optimal set of indexes under a certain workload and memory availability is a hard problem which requires exhaustive search of a big search space.

The Database Engine Tuning Advisor (DTA) implements just that! I recommend you record a representative workload (including writes!) and let the DTA give you suggestions. It will take disk space into account, too.

like image 1
usr Avatar answered Oct 10 '22 09:10

usr