Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Index Which should be clustered?

I have a number of indexes on some tables, they are all similar and I want to know if the Clustered Index is on the correct column. Here are the stats from the two most active indexes:

Nonclustered
I3_Identity (bigint)
rows: 193,781
pages: 3821
MB: 29.85
user seeks: 463,355
user_scans: 784
user_lookups: 0
updates: 256,516

Clustered Primary Key
I3_RowId (varchar(80))
rows: 193,781
pages: 24,289
MB: 189.76
user_seeks: 2,473,413
user_scans: 958
user_lookups: 463,693
updates: 2,669,261

As you can see, the PK is being seeked often, but all the seeks for the i3_identity column are doing key lookups to this PK as well, so am I really benefiting from the index on I3_Identity much at all? Should I change to using the I3_Identity as the clustered? This could have a huge impact as this table structure is repeated about 10000 times where I work, so any help would be appreciated.

like image 849
kdevine Avatar asked May 01 '09 19:05

kdevine


Video Answer


1 Answers

Frederik sums it up nicely, and that's really what Kimberly Tripp also preaches: the clustering key should be stable (never changes), ever increasing (IDENTITY INT), small and unique.

In your scenario, I'd much rather put the clustering key on the BIGINT column rather than the VARCHAR(80) column.

First of all, with the BIGINT column, it's reasonably easy to enforce uniqueness (if you don't enforce and guarantee uniqueness yourself, SQL Server will add a 4-byte "uniquefier" to each and every one of your rows) and it's MUCH smaller on average than a VARCHAR(80).

Why is size so important? The clustering key will also be added to EACH and every one of your non-clustered indexes - so if you have a lot of rows and a lot of non-clustered indexes, having 40-80 byte vs. 8 byte can quickly make a HUGE difference.

Also, another performance tip: in order to avoid the so-called bookmark lookups (from a value in your non-clustered index via the clustering key into the actual data leaf pages), SQL Server 2005 has introduced the notion of "included columns" in your non-clustered indexes. Those are extremely helpful, and often overlooked. If your queries often require the index fields plus just one or two other fields from the database, consider including those in order to achieve what is called "covering indexes". Again - see Kimberly Tripp's excellent article - she's the SQL Server Indexing Goddess! :-) and she can explain that stuff much better than I can...

So to sum it up: put your clustering key on a small, stable, unique column - and you'll do just fine!

Marc

like image 105
marc_s Avatar answered Oct 25 '22 01:10

marc_s