Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nonclustered index on primary key column?

I have a primary key on a table (say ContactID). SQL Server automatically creates and maintains a clustered INDEX on this column. When I ran the Tuning Advisor (against a performance trace), it seemed to recommend another INDEX on the same column - a NON CLUSTERED index on the contactID column. How will this help - since there is already a clustered index on the column?

like image 286
user2736158 Avatar asked Nov 18 '13 20:11

user2736158


People also ask

Can we create non-clustered index on primary key column?

When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.

Can a primary key be nonclustered?

Yes you can, by specifying the primary key be nonclustered.

Can we index a primary key column?

We can apply a Primary Key constraint and a Clustered Index constraint to different columns in the same table or to the same column. It's a common practice to apply a Clustered Index to a Primary Key. Since the Primary Key is often used to connect data, it's frequently used in searches.

What will happen when a non-clustered index is created on a column of the table?

A non-clustered index doesn't sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another.


2 Answers

In very specific circumstances, such as the table containing ContactID is large, and/or the rows are large (i.e. lots of large varchars), scanning the Clustered Index can be very IO/memory intensive compared to having ContactID as the Clustered Index AND as a Non Clustered Index.

If a query requires a scan of the table, by ContactID, and there is only a Clustered Index on ContactID, then the entire row of data is read from disk. However, if you also have a Non Clustered index on ContactID, then only the ContactID is read from disk.

This has to do with the way Clustered vs Non Clustered are stored on disk. Clustered Indexes are stored by ContactID, but all of the row data is also stored with it. Say each row is large enough to take up one page (8KB), then scanning 100,000,000 rows requires 800,000,000 kb of disk io.

A Non Clustered index would only store ContactID in its "row". Assuming ContactID is 8 bytes (bigint), then 1000 rows of the Non Clustered index can fit in one page (8KB). Now, scanning 100,000,000 rows, by ContactID, only requires (100,000,000 / 1000 * 8) = 800,000 KB of disk io.

If the query being analyzed is called fairly frequently, 800,000 KB compared to 800,000,000 KB is significant.

However, as Evadman suggested, the Tuning Advisor is only looking at a particular workload. In most cases, an extra Non Clustered index will be just extra workload for inserts/deletes.

Real life example, I work with a table that has lots of varchars. The Clustered Index is 5521 MBs. There are several queries, that are called many times a second, that end up doing partial scans of the Clustered Index column (lets call it P_ID). A Non Clustered index on P_ID is 211 MBs (26x smaller than Clustered Idx). This resulted in a substantial decrease in query execution time as well as load on disk and memory.

Bonus: Query to find out size of Clustered and Non Clustered indexes

DECLARE @TableName VARCHAR(200)
SET @TableName = 'NAME_OF_YOUR_TABLE'

SELECT
    OBJECT_NAME(i.OBJECT_ID) AS TableName,
    i.name AS IndexName,
    i.index_id AS IndexID,
    8 * SUM(a.used_pages) AS 'Indexsize(KB)',
    (8 * SUM(a.used_pages)) / 1024 AS 'Indexsize(MB)'
FROM sys.indexes AS i
JOIN sys.partitions AS p 
    ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id
JOIN sys.allocation_units AS a 
    ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.object_id) = @TableName
GROUP BY i.OBJECT_ID,i.index_id,i.name
ORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id
like image 75
Sully Avatar answered Sep 21 '22 11:09

Sully


If the query tuning adviser is recommending a non-clustered index on the primary key, it is also recommending a clustered index on another column (or columns).

A primary key is a constraint, not an index. The assumption made by MS SQL Server is that the primary key is also a major way of how data is retrieved from the table (via 'where ContactID = 2' or a join between tables on ContactID). That assumption means that a clustered index is also automatically created on the columns that make up the primary key. There are other reasons for this behavior as well, but let's keep this simple for now.

Now, if most queries against the table are on contact first name (ContactFirstName field) in something like 'Where ContactFirstName LIKE 'Muh%', then SQL server is going to recommended changing the clustered index from ContactID to ContactFirstName since a table can only have 1 clustered index. The Primary Key constraint will still exist (and prevent dupe rows) but the data in the table will be physically ordered by ContactFirstName.

The workload consumed by the tuning adviser will dictate what the tuning adviser recommends. The tuning adviser will also only use a percent of the highest resource queries from the workload, not the entire workload, in making the determination.

like image 31
Evadman Avatar answered Sep 22 '22 11:09

Evadman