Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does SQL Server 2008 Non-Clustered Index Contain the Clustered Index Fields?

OK, I need this spelled out one more time. I've read the articles on-line, and I still haven't found a definitive answer.

In SQL Server 2008, I have a "core" table with about 50k records and lots of read activity that is used in the same way in all queries. This data is updated once a month, and read hundreds of times a second.

The data has a clustered index on the fields as they are frequently accessed. Let's say that the clustered index is:

CLUSTERED INDEX

Field1 int
Field2 int
Field3 int
Field4 int
Field5 int

Now, there is not a whole lot more data than that, so it would make sense to just put the extra couple of columns into "Included Columns", but SQL Server doesn't allow included columns on the Clustered Index.

So, we have a second index with essentially the same fields as the Clustered Index, with the other columns as "Included Columns". However, from what I've read, I believe this may be redundant?

COVERING INDEX (non-clustered)

Field1 int
Field2 int
Field3 int
Field4 int
Field5 int

INCLUDED COLUMNS

Field6 varchar(96)
Field7 varchar(96)

Does the non-clustered Index ALREADY have the columns from the clustered index defined in it?

If so, how could this second index be created with NO columns at all (besides what is already in the clustered index)? In other words, I'd like to say "This index is exactly the same as the clustered index... with a couple of Included Columns".

Or, would it be better to just put ALL of the columns into the clustered index (including the two that don't identify the record)? The varchar columns do get updated more frequently (a few times a day instead of once a month), so I would have liked to keep them out of the clustered index, but I think that they are deep enough that they won't affect the index tree enough to cause any rebalancing when a change occurs.

So, is there an efficient way to set up these indexes so that all of the columns of this table are available through the index without going back to the table?

like image 984
Flipster Avatar asked Jan 02 '11 16:01

Flipster


3 Answers

Yes - a NonClustered Index accesses the data in the table via the clustered key (when the table has a clustered key, and the Row ID when it does not), so it will include the clustered index fields automatically. This is also a reason why a change to the clustered index, forces a rebuild of all non-clustered indexes.

The additional NC index with the 2 included fields could be valid if that index satisfies a large number of queries, but I'm not sure that is solving the right problem.

Including 2 more fields within the Clustered Key is not ideal, now it is confirmed within the NC index, you can see every index on that table includes the clustered key within it bulking each index out.

This is the main reason you want your clustered key to be as narrow as possible, if anything, you should examine your clustered key as ask why you are choosing a 5 field clustered key, and will that choice lead to fragmentation?

You may be better off with an artificial value (Identity) for the clustered key, and use a unique NC index to enforce the uniqueness you have with the 5 field clustered key.

like image 150
Andrew Avatar answered Oct 21 '22 04:10

Andrew


A clustered index does not need includes. Includes means stored extra data at the lowest level of the index tree. This is the data in a clustered index. So you don't need an overlapping index

However if memory footprint is your concern, then you need to shrink the table. With 50k rows I would consider a smallint surrogate key starting at -32768. Then, you remove the overhead of the C key in every NC index. This means you can have a covering index as mentioned in your question.

Note that once your execution plans are cached and the data is in cache, then your queries will come from memory. Your usage means it will stay in cache for some time. Lack of updates means you won't get statistics-driven recompiles.

However, if your data is almost static, then why call SQL Server at all if performance is a concern? Cache it. Remove the network round trip which is probably your biggest overhead based on my caching comments. We outsource some lookups and caching to our clients to reduce server load (we have 50k writes in around 20 seconds at peak load)

like image 23
gbn Avatar answered Oct 21 '22 04:10

gbn


it would make sense to just put the extra couple of columns into "Included Columns", but SQL Server doesn't allow included columns on the Clustered Index

Including extra columns is impossible, because a clustered index already contains all columns. That's why the index is called clustered.

So, we have a second index with essentially the same fields as the Clustered Index, with the other columns as "Included Columns". However, from what I've read, I believe this may be redundant?

Yeah, it's probably redundant. There are some rare exceptions where the clustered index does not fit into memory.

Does the non-clustered Index ALREADY have the columns from the clustered index defined in it?

Probably: a non-clustered index contains a pointer to the clustered index. If the clustered index is unique, this pointer consists of all clustered index fields. (In most situations, these fields correspond with the primary key.)

So, is there an efficient way to set up these indexes so that all of the columns of this table are available through the index without going back to the table?

In the example you post it looks like the clustered index is sufficient, and you do not need any other indexes to avoid a table lookup. You can verify this by running queries and looking for "key lookup" or "rid lookup" operations.

like image 1
Andomar Avatar answered Oct 21 '22 02:10

Andomar