SQL Server heap v.s. clustered index

Tags:

I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).

I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)

thanks in advance, George

815

asked Aug 27 '09 14:08

George2

2 Answers

Heap storage has nothing to do with these heaps.

Heap just means records themselves are not ordered (i. e. not linked to one another).

When you insert a record, it just gets inserted into the free space the database finds.

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

By creating an index on a clustered table, you in fact always get a composite index

CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)

CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)

Index IX_mytable_5678 is in fact an index on the following columns:

col5
col6
col7
col8
col1
col2
col3
col4

This has one more side effect:

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`

This index:

CREATE INDEX IX_mytable ON mytable (col1)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id

, while this one:

CREATE INDEX IX_mytable ON mytable (col1 DESC)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id DESC

166

answered Sep 28 '22 12:09

Quassnoi

Heaps are just tables without a clustering key - without a key that enforces a certain physical order.

I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.

In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.

See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.

My recommendation would be:

in 99% of all cases try to use a INT IDENTITY as your primary key and let SQL Server make that the clustering key as well
exception #1: if you're bulk loading huge data amounts, you might be fine without a primary / clustering key for your temporary table
exception #2: if you must use a GUID as your primary key, then set your clustering key to a different column - preferably a INT IDENTITY - and I would even create a separate INT column just for that purpose, if no other column can be used

Marc

answered Oct 02 '22 12:10

marc_s

Related questions
                            
                                Best way to add a new column with an initial (but not default) value?
                            
                                SQL join against date ranges?
                            
                                How to transform a MSSQL CTE query to MySQL?
                            
                                SQL Server 2008 Spatial: find a point in polygon
                            
                                INSERT INTO from two different server database
                            
                                SELECT $ (dollar sign)
                            
                                Attach (open) mdf file database with SQL Server Management Studio [closed]
                            
                                How does SqlBulkCopy circumnavigate foreign key constraints?
                            
                                How can I drop a table if there is a foreign key constraint in SQL Server?
                            
                                Get all parents for a child
                            
                                Bulk Insert with filename parameter [duplicate]
                            
                                How to check if change tracking is enabled
                            
                                SQL Server 2008 Unique Column that is Case Sensitive
                            
                                What does varchar(-1) mean?
                            
                                SQL Server BEGIN/END vs BEGIN TRANS/COMMIT/ROLLBACK
                            
                                How to do an inner join on row number in sql server
                            
                                How to profile for one table in SQL Server?
                            
                                hash a SQL row?
                            
                                How can I use if statement after a CTE (SQL Server 2005)
                            
                                Docker + mssql-server-linux: How to launch .sql file during build (from Dockerfile)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL Server heap v.s. clustered index

Tags:

sql-server

sql-server-2008

clustered-index

heap

heap-table

George2

People also ask

2 Answers

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

By creating an index on a clustered table, you in fact always get a composite index

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`

Quassnoi

marc_s

Recent Activity

Donate For Us

SQL Server heap v.s. clustered index

Tags:

sql-server

sql-server-2008

clustered-index

heap

heap-table

George2

People also ask

2 Answers

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

By creating an index on a clustered table, you in fact always get a composite index

A DESC condition in a single-column index on a clustered table makes sense in SQL Server

Quassnoi

marc_s

Related questions

Recent Activity

Donate For Us

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`