I have two tables like this. The 'order' table has 21886 rows.
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=572375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
My question is here.
MariaDB [test]> explain
-> SELECT DISTINCT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 22151 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index; Distinct |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
MariaDB [test]> explain
-> SELECT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> GROUP BY A.id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 65 | Using index; Using temporary |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
Listed above, two queries returns same result but distinct is too slow(explain too many rows). What's the difference?
DISTINCT is used to filter unique records out of all records in the table. It removes the duplicate rows. SELECT DISTINCT will always be the same, or faster than a GROUP BY.
GROUP BY lets you use aggregate functions, like AVG , MAX , MIN , SUM , and COUNT . On the other hand DISTINCT just removes duplicates.
The GROUP BY clause is faster than the SELECT DISTINCT clause (at least in these tests) because it does not require writing to a temporary file on disk.
Well, GROUP BY and DISTINCT have their own use. GROUP BY cannot replace DISTINCT in some situations and DISTINCT cannot take place of GROUP BY. It is as per your choice and situation how you are optimizing both of them and choosing where to use GROUP BY and DISTINCT.
It is usually advised to use DISTINCT
instead of GROUP BY
, since that is what you actually want, and let the optimizer choose the "best" execution plan. However - no optimizer is perfect. Using DISTINCT
the optimizer can have more options for an execution plan. But that also means that it has more options to choose a bad plan.
You write that the DISTINCT
query is "slow", but you don't tell any numbers. In my test (with 10 times as many rows on MariaDB 10.0.19 and 10.3.13) the DISTINCT
query is like (only) 25% slower (562ms/453ms). The EXPLAIN
result is no help at all. It's even "lying". With LIMIT 100, 30
it would need to read at least 130 rows (that's what my EXPLAIN
actually schows for GROUP BY
), but it shows you 65.
I can't explain the 25% difference in execution time, but it seems that the engine is doing a full table/index scan in any case, and sorts the result before it can skip 100 and select 30 rows.
The best plan would probably be:
idx_reg_date
index (table A
) one by one in descending orderidx_order_id
index (table B
)If there are like 10% of rows in A
which have no match in B
, this plan would read something like 143 rows from A
.
Best I could do to somehow force this plan is:
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query returns the same result in 156 ms (3 times faster than GROUP BY
). But that is still too slow. And it's probaly still reading all rows in table A
.
We can proof that a better plan can exist with a "little" subquery trick:
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query executes in "no time" (~ 0 ms) and returns the same result on my test data. And though it's not 100% reliable, it shows that the optimizer is not doing a good job.
So what are my conclusions:
DISTINCT
is not always faster than GROUP BY
drop table if exists `order`;
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into `order`(reg_date)
select from_unixtime(floor(rand(1) * 1000000000)) as reg_date
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 218860;
drop table if exists `order_detail_products`;
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into order_detail_products(id, order_id, order_detail_id, prod_id)
select null as id
, floor(rand(2)*218860)+1 as order_id
, 0 as order_detail_id
, 0 as prod_id
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 437320;
SELECT DISTINCT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 562 ms
SELECT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
GROUP BY A.id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 453 ms
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 156 ms
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- ~ 0 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With