Simplified, I have two tables, <code>contacts</code> and <code>donotcall</code> <pre class="prettyprint"><code>CREATE TABLE contacts ( id int PRIMARY KEY, phone1 varchar(20) NULL, phone2 varchar(20) NULL, phone3 varchar(20) NULL, phone4 varchar(20) NULL ); CREATE TABLE donotcall ( list_id int NOT NULL, phone varchar(20) NOT NULL ); CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall ( list_id ASC, phone ASC ); </code></pre> I would like to see what contacts matches the phone number in a specific list of DoNotCall phone. For faster lookup, I have indexed <code>donotcall</code> on <code>list_id</code> and <code>phone</code>. When I make the following JOIN it takes a long time (eg. 9 seconds): <pre class="prettyprint"><code>SELECT DISTINCT c.id FROM contacts c JOIN donotcall d ON d.list_id = 1 AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4) </code></pre> <img src="https://i.stack.imgur.com/i3X7f.png" alt="Screenshot of execution plan"> Execution plan on Pastebin While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds): <pre class="prettyprint"><code>SELECT c.id FROM contacts c LEFT JOIN donotcall d1 ON d1.list_id = 1 AND d1.phone = c.phone1 LEFT JOIN donotcall d2 ON d2.list_id = 1 AND d2.phone = c.phone2 LEFT JOIN donotcall d3 ON d3.list_id = 1 AND d3.phone = c.phone3 LEFT JOIN donotcall d4 ON d4.list_id = 1 AND d4.phone = c.phone4 WHERE d1.phone IS NOT NULL OR d2.phone IS NOT NULL OR d3.phone IS NOT NULL OR d4.phone IS NOT NULL </code></pre> <img src="https://i.stack.imgur.com/YasWI.png" alt="Screenshot of execution plan"> Execution plan on Pastebin My assumption is that the first snippet runs slowly because it doesn't utilize the index on <code>donotcall</code>. So, how to do a join towards multiple columns and still have it use the index?

SQL Server might think resolving <code>IN (c.phone1, c.phone2, c.phone3, c.phone4)</code> using an index is too expensive. You can test if the index would be faster with a hint: <pre class="prettyprint"><code>SELECT c.* FROM contacts c JOIN donotcall d with (index(IX_donotcall_list_phone)) ON d.list_id = 1 AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4) </code></pre> From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.) Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with: <pre class="prettyprint"><code>select object_name(ind.object_id) as TableName , ind.name as IndexName , stats_date(ind.object_id, ind.index_id) as StatisticsDate from sys.indexes ind order by stats_date(ind.object_id, ind.index_id) desc </code></pre> You can update them manually with: <pre class="prettyprint"><code>EXEC sp_updatestats; </code></pre>

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall

CREATE TABLE contacts
(
    id int PRIMARY KEY,
    phone1 varchar(20) NULL,
    phone2 varchar(20) NULL,
    phone3 varchar(20) NULL,
    phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
    list_id int NOT NULL,
    phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
    list_id ASC,
    phone ASC
);

I would like to see what contacts matches the phone number in a specific list of DoNotCall phone. For faster lookup, I have indexed donotcall on list_id and phone.

When I make the following JOIN it takes a long time (eg. 9 seconds):

SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)

Screenshot of execution plan

Execution plan on Pastebin

While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):

SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
    ON d1.list_id = 1
    AND d1.phone = c.phone1
LEFT JOIN donotcall d2
    ON d2.list_id = 1
    AND d2.phone = c.phone2
LEFT JOIN donotcall d3
    ON d3.list_id = 1
    AND d3.phone = c.phone3
LEFT JOIN donotcall d4
    ON d4.list_id = 1
    AND d4.phone = c.phone4
WHERE
    d1.phone IS NOT NULL
    OR d2.phone IS NOT NULL
    OR d3.phone IS NOT NULL
    OR d4.phone IS NOT NULL

Screenshot of execution plan

Execution plan on Pastebin

My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?

Can you apply index to multiple columns?

A composite index is an index on multiple columns. MySQL allows you to create a composite index that consists of up to 16 columns. A composite index is also known as a multiple-column index.

What will happen if you apply index on multiple-column?

An index with more than one column aggregates the contents.

When we combine multiple columns in a single index it is known as a index?

The answer is very simple in most cases: one index with multiple columns is better—that is, a concatenated or compound index. “Concatenated Indexes” explains them in detail.

Do we need index on join columns?

Indexes can help improve the performance of a nested-loop join in several ways. The biggest benefit often comes when you have a clustered index on the joining column in one of the tables. The presence of a clustered index on a join column frequently determines which table SQL Server chooses as the inner table.

SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.

You can test if the index would be faster with a hint:

SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
    ON d.list_id = 1
    AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)

From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)

Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:

select  object_name(ind.object_id) as TableName
,       ind.name as IndexName
,       stats_date(ind.object_id, ind.index_id) as StatisticsDate
from    sys.indexes ind
order by 
        stats_date(ind.object_id, ind.index_id) desc

You can update them manually with:

EXEC sp_updatestats;

Make use of index when JOIN'ing against multiple columns

Tags:

join

sql-server

indexing

sql-server-2008

ANisus

People also ask

1 Answers

Andomar

Recent Activity

Donate For Us

Make use of index when JOIN'ing against multiple columns

Tags:

join

sql-server

indexing

sql-server-2008

ANisus

People also ask

1 Answers

Andomar

Related questions

Recent Activity

Donate For Us