I'm using PostgreSQL 9.3 and have the following tables (simplified to only show the relevant fields):
SITES:
id
name
...
DEVICES:
id
site_id
mac_address UNIQUE
...
Given the mac_address
of a particular device, and I want to get the details of the associated site
. I have the following two queries:
Using LEFT JOIN:
SELECT s.* FROM sites s
LEFT JOIN devices d ON s.id = d.site_id
WHERE d.mac_address = '00:00:00:00:00:00';
Using SUBQUERY:
SELECT s.* FROM sites s
WHERE s.id IN (SELECT d.site_id FROM devices d WHERE d.mac_address = '00:00:00:00:00:00');
Which of the two queries would have the best performance over an infinitely growing database? I have always leaned towards the LEFT JOIN
option, but would be interested to know how the performance of both rates on a large data set.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
The more data tables have, the subqueries are slower. The less data tables have, the subqueries have equivalent speed as joins. The subqueries are simpler, easier to understand, and easier to read.
IS LEFT join slower than join? The LEFT JOIN query is slower than the INNER JOIN query because it's doing more work.
A Sub-Query Does Not Hurt Performance.
It generally won't make any difference, because they should result in the same query plan. At least, an EXISTS
subquery will; IN
isn't as always as intelligently optimised.
For the subquery, rather than using IN (...)
you should generally prefer EXISTS (...)
.
SELECT s.*
FROM sites s
WHERE EXISTS (
SELECT 1
FROM devices d
WHERE d.mac_address = '00:00:00:00:00:00'
AND d.site_id = s.id
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With