I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using
OR
with aUNION
:select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union select username from users where company = ‘itv’;
From a quick EXPLAIN
:
Using OR
:
Using UNION
:
Doesn't this mean UNION
does in double the work?
While I appreciate UNION
may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Am I wrong?
The reason is that using OR in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans. If you look at the execution plans for your two queries, you'll most likely see scans where you are using the OR and seeks where you are using the UNION .
Both UNION and UNION ALL operators combine rows from result sets into a single result set. The UNION operator removes eliminate duplicate rows, whereas the UNION ALL operator does not. Because the UNION ALL operator does not remove duplicate rows, it runs faster than the UNION operator.
Bookmark this question. Show activity on this post. While each of both select statements takes less than 1 second when executed separately.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company
for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR
condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company
and a separate index on city
. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company
, it would still have to do a table-scan to find rows where city
is London. If it uses the index on city
, it would have to do a table-scan for rows where company
is bbc.
The UNION
solution is for this type of case.
select username from users where company = 'bbc' union select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION
.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’ union select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’ union all select username from users where company = ‘itv’;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With