I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).
I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)
thanks in advance, George
If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
In heap, there is no order in storing data but in Clustered, data storing has an order depending on clustered index key. Data pages are not linked in Heap whereas in Clustered table, they are linked and there is faster sequential access.
On the other hand, with clustered indexes since all the records are already sorted, the SELECT operation is faster if the data is being selected from columns other than the column with clustered index.
A heap is a table without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order.
Heap storage has nothing to do with these heaps.
Heap just means records themselves are not ordered (i. e. not linked to one another).
When you insert a record, it just gets inserted into the free space the database finds.
If you create a secondary index on a HEAP
table, the RID
(a kind of a physical pointer to the storage space) is used as a row pointer.
Clustered index means that the records are part of a B-Tree
. When you insert a record, the B-Tree
needs to be relinked.
If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.
This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier
is appended to the index key that makes if unique (and larger in size).
It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.
CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)
CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)
Index IX_mytable_5678
is in fact an index on the following columns:
col5
col6
col7
col8
col1
col2
col3
col4
This has one more side effect:
DESC
condition in a single-column index on a clustered table makes sense in SQL Server
This index:
CREATE INDEX IX_mytable ON mytable (col1)
can be used in a query like this:
SELECT TOP 100 *
FROM mytable
ORDER BY
col1, id
, while this one:
CREATE INDEX IX_mytable ON mytable (col1 DESC)
can be used in a query like this:
SELECT TOP 100 *
FROM mytable
ORDER BY
col1, id DESC
Heaps are just tables without a clustering key - without a key that enforces a certain physical order.
I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.
In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.
See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.
My recommendation would be:
INT IDENTITY
as your primary key and let SQL Server make that the clustering key as wellINT IDENTITY
- and I would even create a separate INT column just for that purpose, if no other column can be usedMarc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With