I've recently got the advice, that I should convert all our tables from using heap indexes such that each table has a clustered index. What are the consequences of persuing this strategy? E.g. is it more important to regularly reorganize the database? datagrowth? danger of really slow inserts? Danger of page-defragmentation if the PK is a GUID? Noticable speed-increase of my application? What are your experiences?
To serve as inspiration for good answers, here are some of the "facts" I've picked up from other threads here on stackoverflow
Data inserted only at the end of the Clustered Index can introduce the so-called Last Page Insert Latch Contention, because you have a single hotspot at the end of your Clustered Index, where multiple queries compete against each other, when traversing through the B-tree structure.
Disabling a clustered index on a view or a nonclustered index physically deletes the index data. Disabling a clustered index on a table prevents access to the data; the data still remains in the table, but is unavailable for data manipulation language (DML) operations until the index is dropped or rebuilt.
The Effects of Poor Indexing If indexes are not created properly, the database has to go through more records in order to retrieve the data requested by a query. Therefore, it uses more hardware resources (processor, memory, disk, and network) and makes fetching the data take longer.
A clustered index is useful for range queries because the data is logically sorted on the key. You can move a table to another filegroup by recreating the clustered index on a different filegroup. You do not have to drop the table as you would to move a heap. A clustering key is a part of all nonclustered indexes.
If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac'
possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion.
But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like date
, state
or deleted
. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'.
The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines.
Going over over your points:
Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better.
A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range.
Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved.
Only probe SELECTs will be improved: SELECT ... WHERE key='someguid';
. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well.
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true.
Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys).
Clustering doesn't affect lookup speed - a unique non-clustered index should do the job.
For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With