Does clustered index on foreign key column increase join performance vs non-clustered?

Tags:

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.

Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?

620

asked Mar 11 '10 22:03

alpav

4 Answers

Discussing this type of issue in the absolute isn't very useful.

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

To make wise decisions about index structure, one needs

to understands the way the various index types (and the heap) work
(and, BTW, this varies somewhat between SQL implementations)
to have a good image of the statistical profile of the database(s) at hand:
which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
to have good insight regarding the way the database(s) is (are) going to be be used/queried

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.

106

answered Sep 25 '22 14:09

mjv

I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!

A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

narrow
static (never changes)
unique
if ever possible: ever increasing

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!

However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!

SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.

So in short:

putting an index on your foreign keys is definitely a great idea - do it all the time!
I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column

answered Sep 24 '22 14:09

marc_s

An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.

If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.

answered Sep 25 '22 14:09

gbn

It depends on the database implementation.

For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.

Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.

Refer to the documentation regarding Clustered Index Structures.

An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.

answered Sep 22 '22 14:09

Digicoder

Related questions
                            
                                How to pretty format JSON in Oracle?
                            
                                BigQuery: Return First Value from Different Groups in a Group By
                            
                                How to search the string in query with case insensitive on Clickhouse database?
                            
                                Linq to SQL: select optimization
                            
                                Working out the SQL to query a priority queue table
                            
                                UserName or guid UserID?
                            
                                Nested Repeaters and SqlDataSource Parameters
                            
                                Improving performance of Sql Delete
                            
                                How to get linq to produce exactly the sql I want?
                            
                                Faster 'select distinct thing_id,thing_name from table1' in oracle
                            
                                How do I automatically reset a sequence's value to 0 every year in Oracle 10g?
                            
                                Is it possible to add a logic Constraint to a Foreign Key?
                            
                                Linq to SQL: Get top 10 most ordered products
                            
                                How should I handle "ranked x out of y" data in PostgreSQL?
                            
                                How can I get the date of the first second of the year with SQL?
                            
                                Firebird multiple statements
                            
                                Execute procedure in a trigger
                            
                                SQL find duplicate records occuring within 1 minute of each other
                            
                                How to edit SQL Server Primary Database File .mdf
                            
                                Performing string concatenation from rows of data in a TSQL view (pivot?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does clustered index on foreign key column increase join performance vs non-clustered?

Tags:

performance

sql

join

clustered-index

foreign-keys