This appear to be a weird question. I know different types of indexes in sql server (clustered, non-clustered, unique, filtered, index with included column(s) ...etc) and I know how to create them. Also I know that the index depend on the query but what I don't know is who choose column when creating the index. For example, suppose a simple website that allow users to post text and images. The website has a simple two tables shown in the image :
The query that get user in website is :
Select UserID,UserName from User where Email='something' and Password='something'
Suppose that I want to create index for this table, what column(s) that I should included int the creation of index ? I know that different types of indexes may include different columns but who can I decide when creating clustered or non-clustered which columns should be chosen. I see some examples of indexes that almost always choose the column after the where clause. Is this true ?
The query that gets the posts of user is :
Select * from Posts where UserID='something'
This query is different from the first query. This query may return multiple rows while the first will always return one row. Now the same question, how to choose column ?
What I want to say is how to choose columns when :
The above example is just to illustrate the idea of the question. The goal is not to find a good index for the two queries in the example, but to come up with a base that can be used to help in choosing columns when creating index.
Choose the Order of Columns in Composite Indexes In general, you should put the column expected to be used most often first in the index. You can create a composite index (using several columns), and the same index can be used for queries that reference all of these columns, or just some of them.
An important point to consider after selecting the proper columns to be involved in the index key is the order of the columns in the index key, especially when the key consists of multiple columns. Try to place the columns that are used in the query conditions first in the SQL Server index key.
Rather than creating a Non-clustered index with a wide key, large columns that are used to cover the query can be included to the Non-clustered index as non-key columns, up to 1023 non-key columns, using the INCLUDE clause of the CREATE INDEX T-SQL statement, that is introduced in SQL Server 2005 version, with a ...
No, you should not index all of your columns, and there's several reasons for this: There is a cost to maintain each index during an insert, update or delete statement, that will cause each of those transactions to take longer. It will increase the storage required since each index takes up space on disk.
In perfect world, you'd like to index columns, that appear in WHERE
clause or JOIN
condition. In your case it would be Email
and Password
columns.
So you could go for a nonclustered index on User table and on Email and Password.
So pretty much this index:
CREATE NONCLUSTERED INDEX idx_User_Email_Password ON dbo.User (Email, Password);
So if you will run this query:
SELECT UserID, UserName FROM User WHERE Email = 'something' AND Password = 'something';
You will end up using just created index (most likely) or Clustered index and it will seek trough it. However, your query selects UserID and UserName, which are not included in your index, as a result, your query will do a Key Lookup (it will find records in a created index and will look back at your dbo.User
table to find matching values for SELECT statement (UserID and UserName). To avoid that, you could create index with INCLUDED
columns to remove a Key Lookup (and you would want to do that).
CREATE NONCLUSTERED INDEX idx_User_Email_Password ON dbo.User (Email, Password) INCLUDE (UserID, UserName);
Using this index you will have a nice NON CLUSTERED INDEX seek in your execution plan.
Also, choosing indexed columns order matters. Let's say, your table would contain UserTypeID (there are not many of them). So you would pass some specific UserTypeIDs and a list of UserIDs, then SQL Server would probably want to pick an index, which has UserTypeID as first indexed column.
So some tests:
CREATE TABLE #Users ( UserId INT , UserName VARCHAR(500) , Email VARCHAR(500) , Password VARCHAR(500) ); CREATE CLUSTERED INDEX idx_Users_UserID ON #Users (UserID); -- Some test data from my DB INSERT INTO #Users (UserId, UserName, Email, Password) SELECT TOP (10000) UserId, UserName, Email, 'password' FROM Users;
So this is the query:
SELECT * FROM #Users;
This will perform index Scan, since we don't specify any details.
Now if we specify UserId it will Seek your Clustered index (we have UserId as key):
SELECT * FROM #Users WHERE UserID = 602;
Now let's create index without included columns and query something:
CREATE NONCLUSTERED INDEX idx_Users_Email_Password ON #Users (Email, Password); SELECT * FROM #Users WHERE Email = '[email protected]';
As I've told, it uses created index and does a Key Lookup, it finds matching Email and password and finds rest of the columns in your table to output them (P.S. If you would be ouputting, let's say, only Email, it wouldn't do a Key Lookup, it wouldn't be needed):
Now let's create index with included UserName and run query above. It will produce this nice execution plan with plain NonClustered Index seek as I told you before:
CREATE NONCLUSTERED INDEX idx_Users_Email_Password_iUserName ON #Users (Email, Password) INCLUDE (UserName);
This is a high-quality article and I'd recommended reading it: https://www.simple-talk.com/sql/performance/index-selection-and-the-query-optimizer/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With