Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Indexes and Missing Index Recommendations in SSMS

UNDERSTANDING INDEXES & MISSING INDEX RECOMMENDATIONS

I'm trying to gain a better understanding of indexes. I have a lot of reading to do, and have found a number of valuable resources from other SO posts, some of which I've read, others I still need to read. In the meantime, I'm trying to get better performance out of my database.

I've learned that a covering index is going to be better performing than indexes on individual columns, so I decided to start by deleting my individual indexes and letting the proposed query execution plan recommend indexes.

SSMS INDEX RECOMMENDATION

CREATE NONCLUSTERED INDEX IX_my_index_name
ON [dbo].[my_table] ([field_a],[field_b])
INCLUDE (
   [field_1]
  ,[field_2]
  ,[field_3]
  ,[field_4]
  ,[field_5]
  ,[field_6]
)

TABLE DETAILS

fields 1-6 are the columns I commonly use to join the 2 tables I'm using. fields a & b are found in the where clause of a few time consuming queries I run.

I understand using fields 1-6 because for the most part they all contain many different values, but field a has only about 75 distinct values, and field b only has 3 distinct values. This is in a table with 70MM records in it.

Note that this is a heap. All of the records on this table come from another table that has a primary key, so that unique value comes with it, but it's not set up as a key or a unique index on this table. SSMS didn't recommend including that column in this index. Wondering how I should handle the unique value coming into this table? A clustered, unique index I'm guessing?

MY QUESTIONS

  1. I want to understand the logic behind this index recommendation. Given the information regarding the similar values in columns a & b, why was this recommended?

  2. I want to understand the difference between the ON columns and the INCLUDE columns?

like image 833
Jon Avatar asked Jul 24 '15 12:07

Jon


People also ask

How do I analyze missing indexes in SQL Server?

To determine which missing index groups a particular missing index is part of, you can query the sys. dm_db_missing_index_groups dynamic management view by equijoining it with sys. dm_db_missing_index_details based on the index_handle column.

What does missing index mean in SQL?

What Are Missing Indexes? When SQL Server is processing a query, it will sometimes make a suggestion for an index that it believes will help that query run faster. These are known as Missing Indexes, or as I like to refer to them, Missing Index Suggestions.


2 Answers

The ON columns in the index can be used for searching the rows. Those fields are included in the index tree. Once the rows are found, if any additional columns are needed, for example fields in select part or joins, they have to be fetched from the table. This is called a key lookup in the execution plan.

If the index has multiple columns, and not all columns are specified in the where clause, the columns can be used from first onwards as long as the fields are given. For example index has fields A, B, C, D and where clause has fields A, B and D, then only A and B can be used to fetch the data.

If the table has a clustered index, the values of the keys in the clustered index are stored in the other indexes and are used to find the row from the table itself. If there is no clustered index, RID (Row ID) is used in similar way to locate the rows from the table.

The include columns in index are additional columns and their data is stored at the leaf level of the non-clustered index. This way SQL Server can read the data directly from there and skip the whole part of reading the table. This is called a covering index.

like image 148
James Z Avatar answered Nov 11 '22 14:11

James Z


The first thing I'd ask is whether there is a good reason for a table of that size doesn't have a clustered index? A clustered key doesn't even have to be unique (SQL Server will add a 'uniquifier' to it if not, although it's usually best to use an IDENTITY column).

To answer your two questions:

1) The index recommendation is related to the query you are running. As a rule of thumb, the suggested columns will match the columns the query optimiser is using to probe into the table, so if you have a query like:

SELECT field1, field2, field3
FROM   table1
WHERE  field4 = 1 AND field5 = 'bob'

The suggested index is likely to be on the field4 and field5 columns, and in order of selectivity (i.e. the column with the most variation in values first). It may include other columns (for instance field1, field2, field3) because then the query optimiser will only have to visit the index to get that data, and not visit the data page.

Note also that sometimes the suggested index is not always the one you might choose yourself. If joining several tables, the query optimiser will choose the execution plan that it thinks best suits the data, based on available indexes and statistics. It might loop over one table and probe into another, when the best possible plan might do it the other way around. You have to inspect the actual query execution plan to see what is going on.

If you know your query is selective enough to drill down to a small range of records (for instance has a where clause like WHERE table1.field1 = 1 AND table1.field2 = 'abc' AND table1.field3 = '2015-07-01' ...), you can add an index that covers all the referenced columns. This might influence the query optimiser to scan this index to get a small number of rows to join to another table, rather than performing scans.

As a rule of thumb, a good place to start when examining the execution plans is trying to eliminate scans, where the server will be reading a large range of rows, and provide indexes that narrow down the amount of data that has to be processed.

2) I think others have probably explained this well enough by now - the included columns are there so that when the index is read, the server doesn't then have to read the data page to get those values; they are stored on the index as well.

The initial response a lot of people may have when they read about such 'covering indexes' is "why don't I add a whole bunch of indexes that do this", or "why don't I add an index that covers all the columns".

In some situations (usually small tables with narrow columns, such as many-to-many joining tables), this is useful. However, with each index you add comes some costs:

Firstly, every time you update or insert a value into your table, the index has to be updated. This means you will have to contend with locking, lock escalation issues (possibly deadlocking), page splits, and the associated fragmentation. There are various ways to mitigate these issues, such as using an appropriate fill-factor to allow more values to be inserted into an index page without having to split it.

Secondly, indexes take up space. At the very least, an index is going to contain the key values you use and either the RID (in a heap) or clustering key (in a table with a clustered index). Covering indexes also contain a copy of the included columns. If these are large columns (such as big varchars) then the index can be quite large and it is not unheard of for a tables indexes to add up to be bigger than the table itself. Note that there are also limits on the size of an index, both in terms of columns, and total size. Because the clustering key is always included in non-clustered indexes on a table with a clustered index (the clustered index is on the data page itself), this means that a smaller clustered key is better. Whilst you can use a composite index, this is likely to be several bytes wide, and whilst you can use a non-unique key, SQL Server will add that uniquifier to it, which is another 4 bytes. Best practice is to use an identify column (int, or bigint if you envisage ever having more than 2 billion rows in the table). Identities also always increment, so you won't get page splits in your data pages when inserting a new record, as it will always go on the end of the table.

so the tl;dr; is:

The suggested indexes can be useful, but often don't give the best index. if you know the structure of your data and how it will be queried, you can construct indexes that contain the commonly use probing keys.

Always order the columns in your index in the order of selectivity (i.e. the column with the most values first). This might seem counter-intuitive, but it allows SQL Server to find the data you want faster, with fewer reads.

Included columns are useful, but only usually when they are small columns (e.g. integers). If your query needs six columns from a table and the index covers only five of them, SQL Server will still have to visit the data page, so in this case you're better off without the included columns because they just take up space and have a maintenance cost.

like image 41
Ed B Avatar answered Nov 11 '22 15:11

Ed B