Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are UNION queries so slow in MySQL?

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run. Also, a UNION ALL takes longer than a UNION DISTINCT.

I would assume allowing duplicates would make the query run faster and not slower. Am I really just better off running the 2 queries separately? I would prefer to use the UNION.

As a simple example if I do

SELECT name FROM t1 WHERE field1 = true

it takes .001 seconds

and if I do

SELECT name FROM t1 WHERE field1 = false

it takes .1 seconds.

If I then run

SELECT name FROM t1 WHERE field1 = true 
UNION ALL 
SELECT name FROM t1 WHERE field1 = false

it takes over 1 second.

like image 817
Greg Avatar asked May 15 '09 17:05

Greg


4 Answers

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run.

Do your queries include ORDER BY … LIMIT clauses?

If you put an ORDER BY … LIMIT after a UNION, it gets applied to the whole UNION, and indexes cannot be used in this case.

If id is a primary key, this query will be instant:

SELECT  *
FROM    table
ORDER BY id
LIMIT 1

, but this one will not:

SELECT  *
FROM    table
UNION ALL
SELECT  *
FROM    table
ORDER BY id
LIMIT 1

Also, a UNION ALL takes longer than a UNION DISTINCT. I would assume allowing duplicates would make the query run faster and not slower.

This also seems to be due to ORDER BY. Sorting a smaller set is faster than a larger one.

Am I really just better off running the 2 queries separately? I would prefer to use the UNION

Do you need the resulting set to be sorted?

If not, just get rid of the final ORDER BY.

like image 53
Quassnoi Avatar answered Oct 11 '22 15:10

Quassnoi


A guess: Since you query one table with 2 unions, it might be,that mysql has difficulties to decide on a locking strategy for the table, or it tries some caching, that doesn't work here since you query for disjoint sets, tries to multithread the access (very reasonable) but runs into some locking/concurrency/file-seeking issues..

unions might also generally employ a higher safety setting, since these two selects have to be consistent. If you put them into separate transactions, they do not.

Experiment: Make a duplicate of the table and union those. If I'm right, it should be faster.

Possible solution: Split the single file into multiple files, to allow for better concurrency strategies. This wouldn't/shouldn't help with locking issues, but rules out the multithreading/seeking problems in the database.

It would be useful to know, which storage engine you use.

Well just my 2 cents. Can't test this here right now.

like image 34
AndreasT Avatar answered Oct 11 '22 15:10

AndreasT


Concerning UNION

  • UNION DISTINCT (the default for UNION) is necessarily slower because it must collect the two results, then dedup. However, since there is less being returned to the client, there could some compensation there.
  • Until recent versions, all UNIONs involved a temp table to collect the results, hence UNION was necessarily slower than two separate SELECTs. Recently, (MySQL 5.7, MariaDB 10.1) some cases of UNION ALLwere improved to deliver the data from one SELECT straight to the client, then deliver the others.
  • SELECT .. UNION SELECT .. ORDER BY .. is equivalent to
    (SELECT .. UNION SELECT ..) ORDER BY .. -- this
    (SELECT ..) UNION (SELECT .. ORDER BY ..) -- not this
    Suggest always using parens around each SELECT.
  • Sorting (via ORDER BY) may take extra time for whatever (select or union) it is attached to. It is very unlikely to take less time. Simply put, the Optimizer aims to do whatever is fastest, which might happen to be sorted.
  • All these statements apply to InnoDB; MyISAM, which is not supported much, may be missing some recent optimizations.
  • Sometimes a single SELECT with an OR can be sped up by turning it into a UNION, thereby letting two indexes be used.

Concerning testing:

  • A time of 0.001 smells like you have run the query before and the result is cached in the "Query cache". Avoid that by turning off the QC or adding SQL_NO_CACHE.
  • Selecting WHERE flag = true (or false) has several cases: Is flag indexed? Is flag almost always one of those values? In that case, the index will be used, and probably faster than the other case.
  • If you disagree with any of my statements, please provide a working example to the contrary.
like image 5
Rick James Avatar answered Oct 11 '22 15:10

Rick James


UNION ALL is faster than UNION because plain UNION is expecting that within two joined datasets are duplicates which need to be removed. If you can ensure (by inner WHERE clauses) that there will be no duplicates, it's far better to use UNION ALL and let database engine optimize the inner selects.

Using a WHERE clause on the result of grouped results is too expensive because you are operating on more internal results than you need. Also, the database engine's optimization can’t be processed — the results don't have anything in common.

check this link for details https://dzone.com/articles/performance-tip-for-tuning-sql-with-union

like image 1
Mohamed El-Touny Avatar answered Oct 11 '22 13:10

Mohamed El-Touny