I have two tables. order_details
which is 100,000 rows, and outbound
which is 10,000 rows.
I need to join them on a column called order_number
, which is a VARCHAR(50) on both. order_number is not unique in the outbound table.
CREATE TABLE `outbound` (
`outbound_id` int(12) NOT NULL,
`order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `order_details` (
`order_details_id` int(12) NOT NULL,
`order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is my initial query, and it takes well over 60 seconds to run:
SELECT o.order_number
FROM outbound o
INNER JOIN order_details od
ON o.order_number = od.order_number
This query gets the same results and takes less than a second to run:
SELECT o.order_number
FROM outbound o
INNER JOIN
(
SELECT order_number
FROM order_details
) od
ON (o.order_number = od.order_number)
This is surprising to me because usually sub-queries are significantly slower.
Running EXPLAIN
(which I'm still learning how to understand) shows that the sub query version uses a derived2
table, that it is using an index, and that index is auto_key0
. I'm not savvy enough to know how to interpret this to understand why this makes a significant difference.
I am running these queries over command line.
I am running MySQL Ver 14.14 Distrib 5.6.35, for Linux (x86_64) CentOS.
In summary:
Why is this simple join query significantly quicker with a sub-query?
For subqueries and joins, the data needs to be combined. Small amounts can easily be combined in memory, but if the data gets bigger, then it might not fit, causing the need to swap temporary data to disk, degrading performance. So, there is no general rule to say which one is faster.
SQL Joins and Subqueries. An SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. A subquery is a query that is nested inside a SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.
A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
My knowledge of MySQL is very limited. But these are my thoughts:
Your tables don't have indexes. Then the join has to read the entire second table in order to compare, for each row of the first table.
The subquery reads the second table once and creates an index, then it doesn't need to read the entire second table for each row of the first table. It only has to check the index, which is much more faster.
To verify if I'm ritght or not, try creating indexes for the column order_number in your two tables (CREATE INDEX ... ), and run again this two queries. Your first query should only take less than a second instead of a minute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With