Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use INCLUDE in a SQL index

I recently encountered an index in a database I maintain that was of the form:

CREATE INDEX [IX_Foo] ON [Foo]
( Id ASC )
INCLUDE 
( SubId )

In this particular case, the performance problem that I was encountering (a slow SELECT filtering on both Id and SubId) could be fixed by simply moving the SubId column into the index proper rather than as an included column.

This got me thinking however that I don't understand the reasoning behind included columns at all, when generally, they could simply be a part of the index itself. Even if I don't particularly care about the items being in the index itself is there any downside to having column in the index rather than simply being included.

After some research, I am aware that there are a number of restrictions on what can go into an indexed column (maximum width of the index, and some column types that can't be indexed like 'image'). In these cases I can see that you would be forced to include the column in the index page data.

The only thing I can think of is that if there are updates on SubId, the row will not need to be relocated if the column is included (though the value in the index would need to be changed). Is there something else that I'm missing?

I'm considering going through the other indexes in the database and shifting included columns in the index proper where possible. Would this be a mistake?

I'm primarily interested in MS SQL Server, but information on other DB engines is welcome also.

like image 402
StarLite Avatar asked Mar 13 '10 01:03

StarLite


Video Answer


1 Answers

The answers so far are all correct and all - but they might not convey enough what you gain from a covering index.

In your case, you have a table Foo and some fields, including an Id (which I assume is the primary key), and a SubId which is some additional ID of some kind.

You also have an index IX_Foo which I assume had only Id in it for now.

So now you need to find the SubId for Id=4.

SELECT Id, SubId
FROM Foo
WHERE Id=4
  • SQL Server will look at the SELECT statement and determine it can use IX_Foo
  • it will then go search for the value Id=4 in your index IX_Foo
  • when it finds it, it now needs the value of SubId, too
  • the non-clustered index IX_Foo will contain the clustering key value
  • using that clustering key value, SQL Server will do a "bookmark lookup" to locate the actual data page where your entire data row is located
  • it will fetch that page and extract the value for SubId from it
  • it will return those values to satisfy your query

The main point here is: once SQL Server has found your Id=4 in the IX_Foo index, it will then need to do another I/O operation, a bookmark lookup, to go fetch the whole data row, in order to be able to find the SubId value.

If you have a covering index, e.g. IX_Foo also includes SubId, that extra I/O to do the bookmark lookup is eliminated. Once the value Id=4 is found in the IX_Foo index, that index page in your non-clustered index will also include the value of SubId - SQL Server can now return those two values you asked for in your SELECT query without having to do an extra (potentially expensive and thus slow) bookmark lookup just to go fetch another Id column.

That's the main benefit of covering indices - if you only need one or two extra columns, besides the index values you're doing the lookup on, by including those values into the index itself, you can save yourself a lot of bookmark lookups and thus speed things up significantly. You should however only include very few, and small bits of information - don't duplicate your entire data rows into all non-clustered indices! That's not the point.

UPDATE: the trade-off is this: if you have an index on (Id, SubId), all the pages in the index have both columns - the whole index tree through.

If you INCLUDE(SubId), the SubId fields are only present at the leaf level.

This means

  • SQL Server can't search and compare on SubId (the values are not in the index tree)
  • less space is used since the values are only on the leaf level
like image 98
marc_s Avatar answered Oct 25 '22 22:10

marc_s