Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composite clustered index vs non-unique clustered index. Which is better/worse in this case?

I have a database where all tables include a Site column (char(4)) and a PrimaryId column (int).

Currently the clustered index on all tables is the combination of these two columns. Many customers only have one site so in those cases I think it definitely makes sense to change the clustered index to only include the PrimaryId.

In cases where there are multiple sites though, I'm wondering whether it would still be advantageous to only use the PrimaryId as the clustered index? Might having a smaller clustered index produce better performance than having a unique one?

In case it's relevant, there are generally not going to be more than a few sites. 10 sites would be a lot.

like image 761
BVernon Avatar asked Feb 15 '23 08:02

BVernon


2 Answers

The answer is simple UNIQUE index is always better then NON-UNIQUE. There is some maths behind it but the greater uniqueness is the faster server can look up a record from index.

CLUSTERED index is great as they physically order the records on disk and it always a good idea to use CLUSTERED INDEX on UNIQUE keys.

CLUSTER INDEX with PRIMARY KEY give very good performance with large data. If your data is not high in column then it will not matter much.

like image 145
particle Avatar answered Feb 17 '23 11:02

particle


I have recently read a article about how nonclustered indexes are matching table rows. I will try to summarize what I believe is relevant to your question.

There are two types of tables (in the context of indexes):

  • heap - a table without clustered index
  • clustered index - a table with clustered index

In the first case a nonclustered index is matching rows using RIP-Based bookmarks which has the following format:

file number - page number - row number

and a nonclustered index is looking like this:

enter image description here

You can see the RIP bookmark is in red.

Generally speaking, the rows of a heap do not move; once they have been inserted into a page they remain on that page. To be more technically-precise: rows in a heap seldom move, and when they do move, they leave a forwarding address at the old location. The rows of a clustered index, however, can move; that is, they can be relocated to another page during data modification or index reorganization.

In the second the nonclustered index is using the index key of the clustered index as a bookmark and the clustered index itself should meet several criteria:

  1. it must be unique
  2. it should be short
  3. it should be static

I am going to describe the first criteria (the others are described in the link below):

Each index entry bookmark must allow SQL Server to find the one row in the table that corresponds to that entry. If you create a clustered index that is not unique, SQL Server will make the clustered index unique by generating an additional value that "breaks the tie" for duplicate keys. This extra value is generated by SQL Server to create uniqueness is called the uniquifier and is transparent to any client application. You should carefully consider whether or not to allow duplicates in a clustered index, for the following reasons:

  1. Generating uniquifiers is extra overhead. SQL Server must decide, at insert time, if a new row's key is a duplicate of an existing row's key; and, if so, generate a uniquifier values to add to the new row

  2. The uniquifier is a meaningless piece of information; a meaningless piece of information that is being propagated into the table's nonclustered indexes. It's usually better to propagate a meaningful piece of information into the nonclustered indexes.

The whole article can be found here.

like image 41
gotqn Avatar answered Feb 17 '23 11:02

gotqn