I recently encountered an index in a database I maintain that was of the form:
CREATE INDEX [IX_Foo] ON [Foo]
( Id ASC )
INCLUDE
( SubId )
In this particular case, the performance problem that I was encountering (a slow SELECT filtering on both Id and SubId) could be fixed by simply moving the SubId column into the index proper rather than as an included column.
This got me thinking however that I don't understand the reasoning behind included columns at all, when generally, they could simply be a part of the index itself. Even if I don't particularly care about the items being in the index itself is there any downside to having column in the index rather than simply being included.
After some research, I am aware that there are a number of restrictions on what can go into an indexed column (maximum width of the index, and some column types that can't be indexed like 'image'). In these cases I can see that you would be forced to include the column in the index page data.
The only thing I can think of is that if there are updates on SubId, the row will not need to be relocated if the column is included (though the value in the index would need to be changed). Is there something else that I'm missing?
I'm considering going through the other indexes in the database and shifting included columns in the index proper where possible. Would this be a mistake?
I'm primarily interested in MS SQL Server, but information on other DB engines is welcome also.
The answers so far are all correct and all - but they might not convey enough what you gain from a covering index.
In your case, you have a table Foo
and some fields, including an Id
(which I assume is the primary key), and a SubId
which is some additional ID of some kind.
You also have an index IX_Foo
which I assume had only Id
in it for now.
So now you need to find the SubId
for Id=4
.
SELECT Id, SubId
FROM Foo
WHERE Id=4
IX_Foo
Id=4
in your index IX_Foo
SubId
, tooIX_Foo
will contain the clustering key valueSubId
from itThe main point here is: once SQL Server has found your Id=4
in the IX_Foo
index, it will then need to do another I/O operation, a bookmark lookup, to go fetch the whole data row, in order to be able to find the SubId
value.
If you have a covering index, e.g. IX_Foo
also includes SubId
, that extra I/O to do the bookmark lookup is eliminated. Once the value Id=4
is found in the IX_Foo
index, that index page in your non-clustered index will also include the value of SubId
- SQL Server can now return those two values you asked for in your SELECT query without having to do an extra (potentially expensive and thus slow) bookmark lookup just to go fetch another Id column.
That's the main benefit of covering indices - if you only need one or two extra columns, besides the index values you're doing the lookup on, by including those values into the index itself, you can save yourself a lot of bookmark lookups and thus speed things up significantly. You should however only include very few, and small bits of information - don't duplicate your entire data rows into all non-clustered indices! That's not the point.
UPDATE: the trade-off is this: if you have an index on (Id, SubId), all the pages in the index have both columns - the whole index tree through.
If you INCLUDE(SubId), the SubId fields are only present at the leaf level.
This means
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With