Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this simple join query significantly quicker with a sub-query?

I have two tables. order_details which is 100,000 rows, and outbound which is 10,000 rows.

I need to join them on a column called order_number, which is a VARCHAR(50) on both. order_number is not unique in the outbound table.

CREATE TABLE `outbound` (
    `outbound_id` int(12) NOT NULL,
    `order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `order_details` (
    `order_details_id` int(12) NOT NULL,
    `order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This is my initial query, and it takes well over 60 seconds to run:

SELECT o.order_number
FROM outbound o
INNER JOIN order_details od
    ON o.order_number = od.order_number

This query gets the same results and takes less than a second to run:

SELECT o.order_number
FROM outbound o
INNER JOIN
(
    SELECT order_number
    FROM order_details
) od
ON (o.order_number = od.order_number)

This is surprising to me because usually sub-queries are significantly slower.

Running EXPLAIN (which I'm still learning how to understand) shows that the sub query version uses a derived2 table, that it is using an index, and that index is auto_key0. I'm not savvy enough to know how to interpret this to understand why this makes a significant difference.

I am running these queries over command line.

I am running MySQL Ver 14.14 Distrib 5.6.35, for Linux (x86_64) CentOS.

In summary:

Why is this simple join query significantly quicker with a sub-query?

like image 473
Goose Avatar asked Jul 28 '17 13:07

Goose


People also ask

Is subquery faster than two queries?

For subqueries and joins, the data needs to be combined. Small amounts can easily be combined in memory, but if the data gets bigger, then it might not fit, causing the need to swap temporary data to disk, degrading performance. So, there is no general rule to say which one is faster.

What is the difference between sub query and joins?

SQL Joins and Subqueries. An SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. A subquery is a query that is nested inside a SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.

What is the purpose of a sub query?

A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.


1 Answers

My knowledge of MySQL is very limited. But these are my thoughts:

Your tables don't have indexes. Then the join has to read the entire second table in order to compare, for each row of the first table.

The subquery reads the second table once and creates an index, then it doesn't need to read the entire second table for each row of the first table. It only has to check the index, which is much more faster.

To verify if I'm ritght or not, try creating indexes for the column order_number in your two tables (CREATE INDEX ... ), and run again this two queries. Your first query should only take less than a second instead of a minute.

like image 149
Egl Avatar answered Oct 06 '22 23:10

Egl