Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up Joins with Indexes

Tags:

sql

indexing

I understand that using indexes can help speed up joins of two or more tables. The following example joins two tables, emps and depts, using their shared department_id column:

select last_name, department_name 
from emps join depts 
using(department_id);

My question is: would indexing the department_id column on one of the two tables speed up this query, or would I have to create an index on both department_id columns from both tables in order to see an improvement in performance?

like image 363
JTruant Avatar asked Jun 14 '18 12:06

JTruant


People also ask

Do indexes make joins faster?

Indexes can help improve the performance of a nested-loop join in several ways. The biggest benefit often comes when you have a clustered index on the joining column in one of the tables. The presence of a clustered index on a join column frequently determines which table SQL Server chooses as the inner table.

How much do indexes speed up queries?

A properly created database index can improve query performance by 99% or more.


1 Answers

The two tables would naturally have an index on department_id already, as this should be the depts primary key and the emps foreign key.

In your query, it is rather unlikely that the indexes will be used, though. Why should the DBMS bother to scan index trees when it's finally about all records to read? Simple sequential full table scans and then a join on hashes for instance will usually be much faster.

Let's look at another example:

select e.last_name, d.department_name 
from emps e
join depts d on d.department_id  = e.department_id
where e.first_name = 'Laura';

Here, we are only interested in few employees. This is where indexes come into play. We'll want an index on emps(first_name). Then we'll know the employee record, the department_id, and we can access the associated dept record.

But saying this, we notice that we use the index to look up the table record to look up the department_id. Wouldn't it be faster to get the department_id right from the index? Yes it would. So the index should be on emps(first_name, department_id).

The depts primary key is department_id, so this column is indexed, and we can easily find the depts record with the department name.

But we can ask the same question again: Can't we get the name right from the index, too? This leads us to covering indexes that contain all columns used in a query.

So, while

index idx_emps on emps(first_name, department_id)
index idx_depts on depts(department_id)

suffice, we can get the query still faster with these covering indexes:

index idx_emps on emps(first_name, department_id, last_name)
index idx_depts on depts(department_id, department_name)
like image 125
Thorsten Kettner Avatar answered Nov 01 '22 18:11

Thorsten Kettner