Since both a <code>Table Scan</code> and a <code>Clustered Index Scan</code> essentially scan all records in the table, why is a Clustered Index Scan supposedly better? As an example - what's the performance difference between the following when there are many records?: <pre class="prettyprint"><code>declare @temp table( SomeColumn varchar(50) ) insert into @temp select 'SomeVal' select * from @temp ----------------------------- declare @temp table( RowID int not null identity(1,1) primary key, SomeColumn varchar(50) ) insert into @temp select 'SomeVal' select * from @temp </code></pre>

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map. A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on <code>INSERT</code>, <code>UPDATE</code>, and <code>DELETE</code>. A heap table, however, requires a second write to the IAM. If your query has a <code>RANGE</code> operator (e.g.: <code>SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100</code>), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering. And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan. So: <ul> <li>For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.</li> <li>For a query with a <code>WHERE</code> clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.</li> <li>For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.</li> <li>For <code>INSERT</code>, <code>UPDATE</code>, and <code>DELETE</code> a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.</li> </ul> Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable): <ul> <li> <code>INSERT</code> performance: clustered index wins by about 3% due to the second write needed for a heap.</li> <li> <code>UPDATE</code> performance: clustered index wins by about 8% due to the second lookup needed for a heap.</li> <li> <code>DELETE</code> performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.</li> <li>single <code>SELECT</code> performance: clustered index wins by about 16% due to the second lookup needed for a heap.</li> <li>range <code>SELECT</code> performance: clustered index wins by about 29% due to the random ordering for a heap.</li> <li>concurrent <code>INSERT</code>: heap table wins by 30% under load due to page splits for the clustered index.</li> </ul>

What's the difference between a Table Scan and a Clustered Index Scan?

Tags:

sql

sql-server

indexing

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

As an example - what's the performance difference between the following when there are many records?:

declare @temp table(     SomeColumn varchar(50) )  insert into @temp select 'SomeVal'  select * from @temp  -----------------------------  declare @temp table(     RowID int not null identity(1,1) primary key,     SomeColumn varchar(50) )  insert into @temp select 'SomeVal'  select * from @temp

837

asked Aug 20 '08 20:08

Seibar

1 Answers

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

So:

For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.
For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.
For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.
For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.

answered Sep 22 '22 04:09

Mark Brackett

Related questions
                            
                                how to insert datetime into the SQL Database table?
                            
                                Insert Picture into SQL Server 2005 Image Field using only SQL
                            
                                Difference between Oracle's plus (+) notation and ansi JOIN notation?
                            
                                How do I use properly CASE..WHEN in MySQL
                            
                                How to drop all NOT NULL constraints from a PostgreSQL table in one go
                            
                                Can I use VARCHAR as the PRIMARY KEY?
                            
                                How to check if an array is empty in Postgres
                            
                                SQL Server 2008- Get table constraints
                            
                                how to execute a .sql script on heroku?
                            
                                How to find the users list in oracle 11g db?
                            
                                How to compare two dates to find time difference in SQL Server 2005, date manipulation
                            
                                How to retrieve field names from temporary table (SQL Server 2008)
                            
                                Getting return value from stored procedure in C#
                            
                                ORA-12560: TNS:protocol adaptor error
                            
                                Postgresql: Check if Schema Exists?
                            
                                SELECT max(x) is returning null; how can I make it return 0?
                            
                                MySQL, coalesce equivalent for empty values?
                            
                                SQLite - ORDER BY RAND()
                            
                                Loading .sql files from within PHP
                            
                                Get execution time of PostgreSQL query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With