Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Performance: Non-clustered Index + INCLUDE columns vs. Clustered Index - equivalent?

Hello SQL Server engine experts; please share with us a bit of your insight...

As I understand it, INCLUDE columns on a non-clustered index allow additional, non-key data to be stored with the index pages.

I am well aware of the performance benefits a clustered index has over a non-clustered index simply due to the 1 fewer step the engine must take in a retrieval in order to arrive at the data on disk.

However, since INCLUDE columns live in a non-clustered index, can the following query be expected to have essentially the same performance across scenarios 1 and 2 since all columns could be retrieved from the index pages in scenario 2 rather than ever resorting to the table data pages?

QUERY

SELECT A, B, C FROM TBL ORDER BY A

SCENARIO 1

CREATE CLUSTERED INDEX IX1 ON TBL (A, B, C);

SCENARIO 2

CREATED NONCLUSTERED INDEX IX1 ON TBL (A) INCLUDE (B, C);
like image 617
Tahbaza Avatar asked Aug 02 '10 01:08

Tahbaza


People also ask

Is clustered index faster than non-clustered index SQL Server?

If you want to select only the index value that is used to create and index, non-clustered indexes are faster.

Why include columns in non-clustered index?

An index with included columns can greatly improve query performance because all columns in the query are included in the index; The query optimizer can locate all columns values within the index without accessing table or clustered index resulting in fewer disk I/O operations.

How do you create an index in SQL to improve performance?

In this case, you can create a large number of SQL Server indexes, adding all required columns as index key or non-key columns to enhance the performance of the SELECT queries and get the requested data faster. Another thing to consider when indexing a database table is the size of the table.


2 Answers

Indeed a non-clustered index with covering include columns can play exactly the same role as a clustered index. The cost is at update time: more include columns means more indexes have to be updated when an included column value is changed in the base table (in the clustered index). Also, with more included columns, the size-of-data increases: the database becomes larger and this can complicate maintenance operations.

In the end, is a balance you have to find between the covering value of the additional indexes and more included columns vs. the cost of update and data size increase.

like image 194
Remus Rusanu Avatar answered Oct 05 '22 06:10

Remus Rusanu


For this example you may actually get better performance with the non-clustered index. But, it really depends on additional information you haven't provided. Here are some thoughts.

SQL Server stores information in 8KB pages; this includes data and indexes. If your table only includes columns A, B and C, then the data will be stored in approximately the same number of data pages and the non-clustered index pages. But, if you have more columns in the table, then the data will need more pages. The number of index pages wouldn't be any different.

So, in a table with more columns than your query needs, the query will work better with the non-clustered covering index (index with all columns). It will be able to process fewer pages to return the results you want.

Of course, performance differences may not be seen until you get a very large number of rows.

like image 21
bobs Avatar answered Oct 05 '22 06:10

bobs