Indexes with included columns, what's the difference?

Tags:

I've never really understood the difference between these two indexes, can someone please explain what the difference is (performance-wise, how the index structure will look like in db, storage-wise etc)?

Included index

CREATE NONCLUSTERED INDEX IX_Address_PostalCode   ON Person.Address (PostalCode)  INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);

'Normal' index

CREATE NONCLUSTERED INDEX IX_Address_PostalCode   ON Person.Address (PostalCode, AddressLine1, AddressLine2, City, StateProvinceID);

570

asked Jan 22 '17 13:01

dadde

1 Answers

The internal storage of indexes uses a B-Tree structure and consists of "index pages" (the root and all intermediate pages) and "index data pages" (the leaf pages only).

Note do not confuse "index data pages" with the "data pages" (leaf pages of clustered indexes) which store most of the columns of actual data.

Only the index columns are stored on the index pages.
By placing some columns in the INCLUDE section, less data per index key is stored on each page.
Meaning fewer pages are needed to hold the index keys. (Making it easier to cache these frequently used pages in memory for longer.)
And possibly fewer levels in the tree. (In such a case performance benefits can be much bigger because every tree level traversal is another disk access.)

When an index is used, the index key is used to navigate through the index pages to the correct index data page.

If the index has INCLUDE columns, that data is immediately available should the query need it.
If the query requires columns not available in either the index keys or the INCLUDE columns, then an additional "bookmark lookup" is required to the correct row in the clustered index (or heap if no clustered index defined).

Some things to note that hopefully addresses some of your confusion:

If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what's in your INCLUDE columns).
Every index you create has overhead for INSERT and UPDATE statements; more so for "bigger" indexes. (Bigger applies to INCLUDE columns as well.)
So while you could in theory create a multitude of big indexes with include columns to match all the permutations of access paths: it would be very counter-productive.

It's worth noting that before INCLUDE columns were added as a feature:

It was a common index tuning 'trick' to expand the keys of an index to include columns that weren't needed in the index/filter. (Known as a covering index.)
These columns were commonly required in output columns or as reference columns for joins to other tables.
This would avoid the infamous "bookmark lookups", but had the disadvantage of making the index 'wider' than strictly necessary.
In fact very often the earlier columns in the index would already identify a unique row meaning the extra included columns would be completely redundant if not for the "avoiding bookmark lookups" benefit.
INCLUDE columns basically allow the same benefit more efficiently.

NB Something very important to point out. You generally get zero benefit out of INCLUDE columns in your indexes if you're in the lazy habit of always writing your queries as SELECT * .... By returning all columns you're basically ensuring a bookmark lookup is required in any case.

answered Oct 15 '22 08:10

Disillusioned

Related questions
                            
                                What is `return of ()` syntax
                            
                                PostgreSQL upgrade on Amazon RDS blocked by PostGIS version
                            
                                Visual Studio Community 2017 cl.exe
                            
                                How do I make a struct callable?
                            
                                Plotting shaded uncertainty region in line plot in matplotlib when data has NaNs
                            
                                Connect docker-compose to external database
                            
                                What's the Swift equivalent of Objective-C's "#ifdef __IPHONE_11_0"?
                            
                                Is there a way to run a pre-checkout step in declarative Jenkins pipelines?
                            
                                Laravel 5.4 - php artisan cache:clear does not clear cache files when using 'file' cache driver
                            
                                Picker Error Message on Exit (encountered while discovering extensions: Error Domain=PlugInKit Code=13) With Swift 4 - Xcode 9
                            
                                Get all current (active) subscriptions
                            
                                What is numpy method int0?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With