Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose columns when creating index?

This appear to be a weird question. I know different types of indexes in sql server (clustered, non-clustered, unique, filtered, index with included column(s) ...etc) and I know how to create them. Also I know that the index depend on the query but what I don't know is who choose column when creating the index. For example, suppose a simple website that allow users to post text and images. The website has a simple two tables shown in the image :

How to choose columns when creating index

The query that get user in website is :

Select UserID,UserName from User where Email='something' and Password='something'

Suppose that I want to create index for this table, what column(s) that I should included int the creation of index ? I know that different types of indexes may include different columns but who can I decide when creating clustered or non-clustered which columns should be chosen. I see some examples of indexes that almost always choose the column after the where clause. Is this true ?

The query that gets the posts of user is :

Select * from Posts where UserID='something'

This query is different from the first query. This query may return multiple rows while the first will always return one row. Now the same question, how to choose column ?

What I want to say is how to choose columns when :

  1. Creating clustered index.
  2. Creating non-clustered index.
  3. Creating non-clustered with included column(s).

The above example is just to illustrate the idea of the question. The goal is not to find a good index for the two queries in the example, but to come up with a base that can be used to help in choosing columns when creating index.

like image 824
Ahmed Shamel Avatar asked Jul 03 '15 12:07

Ahmed Shamel


People also ask

How do you decide which column to create an index?

Choose the Order of Columns in Composite Indexes In general, you should put the column expected to be used most often first in the index. You can create a composite index (using several columns), and the same index can be used for queries that reference all of these columns, or just some of them.

How do I SELECT an index column in SQL?

An important point to consider after selecting the proper columns to be involved in the index key is the order of the columns in the index key, especially when the key consists of multiple columns. Try to place the columns that are used in the query conditions first in the SQL Server index key.

How do you SELECT columns for non-clustered index?

Rather than creating a Non-clustered index with a wide key, large columns that are used to cover the query can be included to the Non-clustered index as non-key columns, up to 1023 non-key columns, using the INCLUDE clause of the CREATE INDEX T-SQL statement, that is introduced in SQL Server 2005 version, with a ...

Can I create index on any column?

No, you should not index all of your columns, and there's several reasons for this: There is a cost to maintain each index during an insert, update or delete statement, that will cause each of those transactions to take longer. It will increase the storage required since each index takes up space on disk.


1 Answers

In perfect world, you'd like to index columns, that appear in WHERE clause or JOIN condition. In your case it would be Email and Password columns.

So you could go for a nonclustered index on User table and on Email and Password.

So pretty much this index:

CREATE NONCLUSTERED INDEX idx_User_Email_Password     ON dbo.User (Email, Password); 

So if you will run this query:

SELECT UserID, UserName FROM User WHERE Email = 'something'     AND Password = 'something'; 

You will end up using just created index (most likely) or Clustered index and it will seek trough it. However, your query selects UserID and UserName, which are not included in your index, as a result, your query will do a Key Lookup (it will find records in a created index and will look back at your dbo.User table to find matching values for SELECT statement (UserID and UserName). To avoid that, you could create index with INCLUDED columns to remove a Key Lookup (and you would want to do that).

CREATE NONCLUSTERED INDEX idx_User_Email_Password     ON dbo.User (Email, Password)     INCLUDE (UserID, UserName); 

Using this index you will have a nice NON CLUSTERED INDEX seek in your execution plan.

Also, choosing indexed columns order matters. Let's say, your table would contain UserTypeID (there are not many of them). So you would pass some specific UserTypeIDs and a list of UserIDs, then SQL Server would probably want to pick an index, which has UserTypeID as first indexed column.

So some tests:

CREATE TABLE #Users (     UserId INT     , UserName VARCHAR(500)     , Email VARCHAR(500)     , Password VARCHAR(500) );  CREATE CLUSTERED INDEX idx_Users_UserID     ON #Users (UserID);  -- Some test data from my DB INSERT INTO #Users (UserId, UserName, Email, Password) SELECT TOP (10000) UserId, UserName, Email, 'password' FROM Users; 

So this is the query:

SELECT * FROM #Users; 

This will perform index Scan, since we don't specify any details. enter image description here

Now if we specify UserId it will Seek your Clustered index (we have UserId as key):

SELECT * FROM #Users WHERE UserID = 602; 

enter image description here

Now let's create index without included columns and query something:

CREATE NONCLUSTERED INDEX idx_Users_Email_Password     ON #Users (Email, Password);  SELECT * FROM #Users WHERE Email = '[email protected]'; 

As I've told, it uses created index and does a Key Lookup, it finds matching Email and password and finds rest of the columns in your table to output them (P.S. If you would be ouputting, let's say, only Email, it wouldn't do a Key Lookup, it wouldn't be needed): enter image description here

Now let's create index with included UserName and run query above. It will produce this nice execution plan with plain NonClustered Index seek as I told you before:

CREATE NONCLUSTERED INDEX idx_Users_Email_Password_iUserName     ON #Users (Email, Password)     INCLUDE (UserName); 

enter image description here

This is a high-quality article and I'd recommended reading it: https://www.simple-talk.com/sql/performance/index-selection-and-the-query-optimizer/

like image 63
Evaldas Buinauskas Avatar answered Oct 10 '22 03:10

Evaldas Buinauskas