Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the advantage of using an INCLUDE column with a SQL Server index?

CREATE NONCLUSTERED INDEX index_name 
    ON <object> ( column [ ASC | DESC ] [ ,...n ] ) 
    [ INCLUDE ( column_name [ ,...n ] ) ]
    [ WHERE <filter_predicate> ]

In the above syntax we will specify non-key columns in the INCLUDE. What is the advantage in specifying non-key columns?

like image 555
Anoop Avatar asked May 03 '09 22:05

Anoop


People also ask

Why we use include in index in SQL?

The include clause allows us to make a distinction between columns we would like to have in the entire index (key columns) and columns we only need in the leaf nodes ( include columns). That means it allows us to remove columns from the non-leaf nodes if we don't need them there.

What are included columns with SQL Server indexes?

Included columns can be used to create a covering indexes without including all the data into the key columns. This covering index has the advantage that the index contains all the columns that are needed for a query.

What is the advantage of having an index on a column?

It helps us find the rows or the values to be searched faster. They have various advantages like increased performance in searching for records, sorting records, grouping records, or maintaining a unique column.

What is an include clause in an index?

The INCLUDE clause, applicable only on unique indexes, specifies additional columns to be appended to the set of index key columns. Any columns included with this clause are not used to enforce uniqueness. These included columns can improve the performance of some queries through index only access.


1 Answers

Let's assume you have an employee table like so:

CREATE TABLE Employee(EmployeeID INT IDENTITY(1,1) PRIMARY KEY,
                      LastName VARCHAR(50),
                      FirstName VARCHAR(50),
                      HireDate DATETIME,
                      Salary DECIMAL)

You would have the primary clustered key on EmployeeID, and possibly a non-clustered key on (LastName,FirstName) in order to be able to find employees by name.

CREATE INDEX NameIndex ON Employee(LastName ASC, FirstName ASC)

Now if you need to find "Joe Murphy" and retrieve his hire date and salary, what happens is an index seek in your name-based non-clustered key (which is good), but then in order to fetch the hire date and salary, SQL Server needs to do a so-called bookmark lookup into the actual table data to get the record for Joe Murphy. This will most likely incur one or several physical disk accesses (which is bad in terms of performance).

HOWEVER: if your name-based non-clustered index also specifies "INCLUDE (HireDate, Salary)":

CREATE INDEX NameIndex ON Employee(LastName ASC, FirstName ASC)
       INCLUDE (HireDate, Salary)

then SQL Server is done once it's looked up Joe Murphy in the non-clustered name index --> all the fields to satisfy your query are in the non-clustered index, so there's no more need to do a disk-intensive bookmark lookup and your queries will be potentially much faster.

The downside of INCLUDE columns are increased disk-space need by non-clustered indexes, since they will have the included columns in their leaf-level nodes. It's a trade-off between speed and size (as usual).

Marc

like image 83
marc_s Avatar answered Oct 10 '22 17:10

marc_s