Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall

CREATE TABLE contacts
(
    id int PRIMARY KEY,
    phone1 varchar(20) NULL,
    phone2 varchar(20) NULL,
    phone3 varchar(20) NULL,
    phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
    list_id int NOT NULL,
    phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
    list_id ASC,
    phone ASC
);

I would like to see what contacts matches the phone number in a specific list of DoNotCall phone. For faster lookup, I have indexed donotcall on list_id and phone.

When I make the following JOIN it takes a long time (eg. 9 seconds):

SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)  

Screenshot of execution plan

Execution plan on Pastebin

While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):

SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
    ON d1.list_id = 1
    AND d1.phone = c.phone1
LEFT JOIN donotcall d2
    ON d2.list_id = 1
    AND d2.phone = c.phone2
LEFT JOIN donotcall d3
    ON d3.list_id = 1
    AND d3.phone = c.phone3
LEFT JOIN donotcall d4
    ON d4.list_id = 1
    AND d4.phone = c.phone4
WHERE
    d1.phone IS NOT NULL
    OR d2.phone IS NOT NULL
    OR d3.phone IS NOT NULL
    OR d4.phone IS NOT NULL

Screenshot of execution plan

Execution plan on Pastebin

My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?

like image 682
ANisus Avatar asked Sep 04 '13 12:09

ANisus


People also ask

Can you apply index to multiple columns?

A composite index is an index on multiple columns. MySQL allows you to create a composite index that consists of up to 16 columns. A composite index is also known as a multiple-column index.

What will happen if you apply index on multiple-column?

An index with more than one column aggregates the contents.

When we combine multiple columns in a single index it is known as a index?

The answer is very simple in most cases: one index with multiple columns is better—that is, a concatenated or compound index. “Concatenated Indexes” explains them in detail.

Do we need index on join columns?

Indexes can help improve the performance of a nested-loop join in several ways. The biggest benefit often comes when you have a clustered index on the joining column in one of the tables. The presence of a clustered index on a join column frequently determines which table SQL Server chooses as the inner table.


1 Answers

SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.

You can test if the index would be faster with a hint:

SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)

From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)

Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:

select  object_name(ind.object_id) as TableName
,       ind.name as IndexName
,       stats_date(ind.object_id, ind.index_id) as StatisticsDate
from    sys.indexes ind
order by 
        stats_date(ind.object_id, ind.index_id) desc

You can update them manually with:

EXEC sp_updatestats;
like image 169
Andomar Avatar answered Sep 24 '22 19:09

Andomar