I've recently got the advice, that I should convert all our tables from using heap indexes such that each table has a clustered index. What are the consequences of persuing this strategy? E.g. is it more important to regularly reorganize the database? datagrowth? danger of really slow inserts? Danger of page-defragmentation if the PK is a GUID? Noticable speed-increase of my application? What are your experiences? To serve as inspiration for good answers, here are some of the "facts" I've picked up from other threads here on stackoverflow <ol> <li>Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better.</li> <li>Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved.</li> <li>The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table. </li> <li>Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html </li> <li>Clustering doesn't affect lookup speed - a unique non-clustered index should do the job. </li> </ol>

If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could <code>between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac'</code> possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion. But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like <code>date</code>, <code>state</code> or <code>deleted</code>. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'. The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines. Going over over your points: <blockquote> Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better. </blockquote> A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range. <blockquote> Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved. </blockquote> Only probe SELECTs will be improved: <code>SELECT ... WHERE key='someguid';</code>. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well. <blockquote> The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table. </blockquote> Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true. <blockquote> Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html </blockquote> I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys). <blockquote> Clustering doesn't affect lookup speed - a unique non-clustered index should do the job. </blockquote> For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.

What are the consequences of converting heap-indexes to clustered indexes on SQL Server?

Tags:

sql

sql-server

indexing

clustered-index

I've recently got the advice, that I should convert all our tables from using heap indexes such that each table has a clustered index. What are the consequences of persuing this strategy? E.g. is it more important to regularly reorganize the database? datagrowth? danger of really slow inserts? Danger of page-defragmentation if the PK is a GUID? Noticable speed-increase of my application? What are your experiences?

To serve as inspiration for good answers, here are some of the "facts" I've picked up from other threads here on stackoverflow

Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better.
Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved.
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
Clustering doesn't affect lookup speed - a unique non-clustered index should do the job.

429

asked Dec 13 '10 21:12

Carlo V. Dango

1 Answers

If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac' possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion.

But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like date, state or deleted. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'.

The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines.

Going over over your points:

Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better.

A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range.

Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved.

Only probe SELECTs will be improved: SELECT ... WHERE key='someguid';. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well.

The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true.

Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html

I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys).

Clustering doesn't affect lookup speed - a unique non-clustered index should do the job.

For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.

187

answered Nov 14 '22 22:11

Remus Rusanu

Related questions
                            
                                Exception java.sql.SQLException: Parameter index out of range (1 > number of parameters, which is 0) [duplicate]
                            
                                Can I chain database links in Oracle?
                            
                                How to use the merge command
                            
                                Connection to SQL Server with qt [closed]
                            
                                Help with a SQL Query
                            
                                Do I gain any performance advantages by using VIEWs rather than JOINs?
                            
                                How can I insert a record and lock that row simultaneously?
                            
                                ADODB connection to ACCDB file -- unrecognized database format error
                            
                                Does something like Common Table Expressions exist in PL/SQL?
                            
                                Outer join performance
                            
                                Is storing SQL queries in a database a bad thing?
                            
                                SQL Server 2008 Update Query with Join and Where clause in joined table
                            
                                Update Table with a "Select query" with a where clause
                            
                                Excel/VBA: How to paste SQL query with proper string formatting
                            
                                SQL Server 2008: should I be using Windows auth or SQL Server auth?
                            
                                Random values in MySql query with pagination
                            
                                SQL query help with non-unique duplicates
                            
                                Difference in time between records
                            
                                Set sort-order field based on alphabetically ordering of another field
                            
                                SQL script to change all table references in all stored procedures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With