I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something.
I have the following table:
Person:
| Id | Name | Email | Phone |
| 1 | John | E1 | P1 |
| 2 | Max | E2 | P2 |
I'm trying to find the best way to index the columns Email
and Phone
considering that the queries will (most of the time) be of the form
[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...
I thought the best approach would be to create a single index using both columns:
CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);
However, with the index above, only the query [2] benefits from an index seek, the others use index scan.
I also tried to create multiple index: one with both columns, one for email, and one for email. In this case, [2] and [3] use seek, but [1] continues to use scan.
Why can't the database use index with an or? What would be the best indexing approach for this table considering the queries?
Use two separate indexes, one on (email)
and one on (phone, email)
.
The OR
is rather difficult. If your conditions were connected by AND
rather than OR
, then your index would be used for the first query (but not the third, because phone
is not the first key in the index).
You can write the query as:
SELECT *
FROM Person
WHERE Email = '...'
UNION ALL
SELECT *
FROM Person
WHERE Email <> '...' AND Phone = '...';
SQL Server should use the appropriate index for each subquery.
Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.
Go through my demo and consider the work involved in each path for the following scenarios -
Only few rows satisfy the condition j=123.
Only few rows satisfy the condition k=456.
Most of the rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.
Only few rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.
Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.
Demo
;with t(n) as (select 0 union all select n+1 from t where n < 999)
select 1+t0.n+1000*t1.n as i
,floor(rand(cast (newid() as varbinary))*1000) as j
,floor(rand(cast (newid() as varbinary))*1000) as k
into t
from t t0,t t1
option (maxrecursion 0)
;
create index t_j on t (j);
create index t_k on t (k);
update statistics t (t_j)
update statistics t (t_k)
select *
from t (forcescan)
where j = 123
or k = 456
select *
from t (forceseek)
where j = 123
or k = 456
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With