Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I remove clustered index from surrogate keys?

From what I understand, an advantage of a clustered index is that it's faster to seek something there if it's part of the indexed columns as the rows are stored 'close'. Since SQL Server automatically makes primary keys as 'clustered index', is it not true that using a clustered index for something that has no meaning like a surrogate key removes this advantage, and instead I should make the natural key the clustered index?

like image 812
Rodolfo Avatar asked Nov 01 '25 14:11

Rodolfo


1 Answers

There is no right or wrong answer here, as it depends. Advantages of clustering, say, an IDENTITY column that is a primary key, as opposed to clustering on something else, include:

  • elimination of "bad" page splits, since increasing values are always inserted at the "end" of the table/partition (this introduces "hot spots" - which may become an issue if you are inserting at a very high rate).
  • no lookups required when a seek or scan is used to find a row or range - if you use a non-clustered index to identify a row, lookups may be necessary to retrieve any non-covered columns.
  • an INT is 4 bytes (and less if you use data compression). So when you have child tables that reference this table, it is quite useful for the primary key to be skinny so that the repeated information in the child tables is also skinny. Having it clustered doesn't have a ton of bearing on this, but it seems like a hand-wavy solution to use a natural key for the clustered index but then perform joins to child tables using the non-clustered surrogate key. In most solutions I've seen, the surrogate is clustered (and distributed throughout the model), and the natural key is simply made unique.

Whether you have a better candidate for the clustering key, who knows? We don't know your application or even what type of data you're thinking about for the natural key. What is the data model? What is the natural key? Is it likely to change? Is it large? Is it really unique?

As an aside, I like to think that SQL Server creates PRIMARY KEY as CLUSTERED by default because (a) most tables should have a clustered index and (b) in most cases the key should be clustered (but not all!).

like image 147
Aaron Bertrand Avatar answered Nov 03 '25 11:11

Aaron Bertrand