Simplified, I have two tables, contacts
and donotcall
CREATE TABLE contacts
(
id int PRIMARY KEY,
phone1 varchar(20) NULL,
phone2 varchar(20) NULL,
phone3 varchar(20) NULL,
phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
list_id int NOT NULL,
phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
list_id ASC,
phone ASC
);
I would like to see what contacts matches the phone number in a specific list of DoNotCall phone.
For faster lookup, I have indexed donotcall
on list_id
and phone
.
When I make the following JOIN it takes a long time (eg. 9 seconds):
SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
Execution plan on Pastebin
While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):
SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
ON d1.list_id = 1
AND d1.phone = c.phone1
LEFT JOIN donotcall d2
ON d2.list_id = 1
AND d2.phone = c.phone2
LEFT JOIN donotcall d3
ON d3.list_id = 1
AND d3.phone = c.phone3
LEFT JOIN donotcall d4
ON d4.list_id = 1
AND d4.phone = c.phone4
WHERE
d1.phone IS NOT NULL
OR d2.phone IS NOT NULL
OR d3.phone IS NOT NULL
OR d4.phone IS NOT NULL
Execution plan on Pastebin
My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall
.
So, how to do a join towards multiple columns and still have it use the index?
A composite index is an index on multiple columns. MySQL allows you to create a composite index that consists of up to 16 columns. A composite index is also known as a multiple-column index.
An index with more than one column aggregates the contents.
The answer is very simple in most cases: one index with multiple columns is better—that is, a concatenated or compound index. “Concatenated Indexes” explains them in detail.
Indexes can help improve the performance of a nested-loop join in several ways. The biggest benefit often comes when you have a clustered index on the joining column in one of the tables. The presence of a clustered index on a join column frequently determines which table SQL Server chooses as the inner table.
SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4)
using an index is too expensive.
You can test if the index would be faster with a hint:
SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)
Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:
select object_name(ind.object_id) as TableName
, ind.name as IndexName
, stats_date(ind.object_id, ind.index_id) as StatisticsDate
from sys.indexes ind
order by
stats_date(ind.object_id, ind.index_id) desc
You can update them manually with:
EXEC sp_updatestats;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With