Speeding up Joins with Indexes

Tags:

indexing

I understand that using indexes can help speed up joins of two or more tables. The following example joins two tables, emps and depts, using their shared department_id column:

select last_name, department_name 
from emps join depts 
using(department_id);

My question is: would indexing the department_id column on one of the two tables speed up this query, or would I have to create an index on both department_id columns from both tables in order to see an improvement in performance?

363

asked Jun 14 '18 12:06

JTruant

1 Answers

The two tables would naturally have an index on department_id already, as this should be the depts primary key and the emps foreign key.

In your query, it is rather unlikely that the indexes will be used, though. Why should the DBMS bother to scan index trees when it's finally about all records to read? Simple sequential full table scans and then a join on hashes for instance will usually be much faster.

Let's look at another example:

select e.last_name, d.department_name 
from emps e
join depts d on d.department_id  = e.department_id
where e.first_name = 'Laura';

Here, we are only interested in few employees. This is where indexes come into play. We'll want an index on emps(first_name). Then we'll know the employee record, the department_id, and we can access the associated dept record.

But saying this, we notice that we use the index to look up the table record to look up the department_id. Wouldn't it be faster to get the department_id right from the index? Yes it would. So the index should be on emps(first_name, department_id).

The depts primary key is department_id, so this column is indexed, and we can easily find the depts record with the department name.

But we can ask the same question again: Can't we get the name right from the index, too? This leads us to covering indexes that contain all columns used in a query.

So, while

index idx_emps on emps(first_name, department_id)
index idx_depts on depts(department_id)

suffice, we can get the query still faster with these covering indexes:

index idx_emps on emps(first_name, department_id, last_name)
index idx_depts on depts(department_id, department_name)

125

answered Nov 01 '22 18:11

Thorsten Kettner

Related questions
                            
                                Source control in SSIS and Concurrent work on dtsx file
                            
                                How to get values alternate for ROW_NUMBER()?
                            
                                SQL Server could not find stored procedure 'show'
                            
                                How to insert NULL into SQL Server DATE field *from XML*
                            
                                A column definition list is required for functions returning "record" in Postgresql
                            
                                SQL array agg and joins
                            
                                Check if column value is zero MS SQL Server
                            
                                MySql : Initialize mySql variable inside a query
                            
                                Trigger: How does the inserted table work? How to access its rows?
                            
                                SQL Output Multiple Local Variables into one Column
                            
                                How to assign 2 default values to SQL table column?
                            
                                Using INSERT and/or UPDATE together from a single CTE
                            
                                Postgresql - Opposite of string_agg
                            
                                SQL GROUP BY only in subquery
                            
                                InvalidOperationException - When executing a command, parameters must be exclusively database parameters or values
                            
                                Sql Query: co-occurrence of column values
                            
                                How to obtain primary key value in trigger function if primary key column name is unknown?
                            
                                What is the most elegant way to store timestamp with nanosec in postgresql?
                            
                                Duplicate a Row Based on a Condition SQL
                            
                                Confused by SCOPE_IDENTITY() and GO

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With