Lately, I have been doing some reading on indexes of all types and the main advice is to put the clustered index on the primary key of the table, but what if the primary key actually is not used in a query (via a select or join) and is just put for purely relational purposes, so in this case it is not queried against. Example, say I have a car_parts
table and it contains 3 columns, car_part_id
, car_part_no
, and car_part_title
. car_part_id
is the unique primary key identity column. In this case car_part_no
is unique as well and is most likely car_part_title
. car_part_no
is what is most queried against, so doesn't it make sense to put the clustered index on that column instead of car_part_id
? The basics of the question is what column should actually have the clustered index since you are only allowed one of them?
Kimberly Tripp is always one of the best sources on insights on indexing.
See her blog post "Ever-increasing clustering key - the Clustered Index Debate - again!" in which she quite clearly lists and explains the main requirements for a good clustering key - it needs to be:
and best of all, if you can manage:
Taking all this into account, an INT IDENTITY
(or BIGINT IDENTITY
if you really need more than 2 billion rows) works out to be the best choice in the vast majority of cases.
One thing a lot of people don't realize (and thus don't take into account when making their choice) is the fact that the clustering key (all the columns that make up the clustered index) will be added to each and every index entry for each and every non-clustered index on your table - thus the "narrow" requirement becomes extra important!
Also, since the clustering key is used for bookmark lookups (looking up the actual data row when a row is found in a non-clustered index), the "unique" requirement also becomes very important. So important in fact, that if you choose a (set of) column(s) that is/are not guaranteed to be unique, SQL Server will add a 4-byte uniquefier to each row --> thus making each and every of your clustered index keys extra wide ; definitely NOT a good thing.
Marc
An index, clustered or non clustred, can be used by the query optimizer if and only if the leftmost key in the index is filtered on. So if you define an index on columns (A, B, C), a WHERE condition on B=@b
, on C=@c
or on B=@b AND C=@c
will not fully leverage the index (see note). This applies also to join conditions. Any WHERE filter that includes A
will consider the index: A=@a
or A=@a AND B=@b
or A=@a AND C=@c
or A=@a AND B=@b AND C=@c
.
So in your example if you make the clustred index on part_no
as the leftmost key, then a query looking for a specific part_id
will not use the index and a separate non-clustered index must exist on part-id
.
Now about the question which of the many indexes should be the clustered one. If you have several query patterns that are about the same importance and frequency and contradict each other on terms of the keys needed (eg. frequent queries by either part_no
or part_id
) then you take other factors into consideration:
Note: not fully leverage as sometimes the engine will choose an non-clustered index to scan instead of the clustered index simply because is narrower and thus has fewer pages to scan. In my example if you have an index on (A, B, C) and a WHERE filter on B=@b
and the query projects C
, the index will be likely used but not as a seek, as a scan, because is still faster than a full clustered scan (fewer pages).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With