Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you index subqueries?

I have a table and a query that looks like below. For a working example, see this SQL Fiddle.

SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN 
(
    SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B
) w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;

With my real data, this query takes 27 seconds. However, if I first create w as a temporary Table and index property_B, it all together takes ~1 second.

CREATE TEMPORARY TABLE w AS
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B;

ALTER TABLE w ADD INDEX `property_B_idx` (property_B);

SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;

DROP TABLE IF EXISTS w;

Is there a way to combine the best of these two queries? I.e. a single query with the speed advantages of the indexing in the subquery?

EDIT

After Mehran's answer below, I read this piece of explanation in the MySQL documentation:

As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):

...

For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution. Consider the following query:

SELECT * FROM t1
  JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;

The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a usual table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If ref access would result in higher cost than some other access method, no index is created and the optimizer loses nothing.

like image 959
physicalattraction Avatar asked Nov 20 '14 14:11

physicalattraction


People also ask

Can you index a subquery?

For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution.

What are the limitations of subqueries?

You cannot include text, unitext, or image datatypes in subqueries. Subqueries cannot manipulate their results internally, that is, a subquery cannot include the order by clause, the compute clause, or the into keyword.

Can you have a subquery within a subquery?

A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.

Can you have 2 subqueries in a select statement?

More formally, it is the use of a SELECT statement inside one of the clauses of another SELECT statement. In fact, a subquery can be contained inside another subquery, which is inside another subquery, and so forth. A subquery can also be nested inside INSERT, UPDATE, and DELETE statements.


3 Answers

First of all you need to know that creating a temporary table is absolutely a feasible solution. But in cases no other choice is applicable which is not true here!

In your case, you can easily boost your query as FrankPl pointed out because your sub-query and main-query are both grouping by the same field. So you don't need any sub-queries. I'm going to copy and paste FrankPl's solution for the sake of completeness:

SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;

Yet it doesn't mean it's impossible to come across a scenario in which you wish you could index a sub-query. In which cases you've got two choices, first is using a temporary table as you pointed out yourself, holding the results of the sub-query. This solution is advantageous since it is supported by MySQL for a long time. It's just not feasible if there's a huge amount of data involved.

The second solution is using MySQL version 5.6 or above. In recent versions of MySQL new algorithms are incorporated so an index defined on a table used within a sub-query can also be used outside of the sub-query.

[UPDATE]

For the edited version of the question I would recommend the following solution:

SELECT o.property_B, SUM(IF(o.property_A = 'specific_A', o.score1, 0)), SUM(o.score2)
FROM o
GROUP BY property_B
HAVING SUM(IF(o.property_A = 'specific_A', o.score1, 0)) > 0;

But you need to work on the HAVING part. You might need to change it according to your actual problem.

like image 86
Mehran Avatar answered Oct 23 '22 05:10

Mehran


I am not really that familiar with MySql, I mostly worked with Oracle. If you want a where-clause in the SUM, you can use decode or case. it would look something like that

SELECT o.property_B, , SUM(decode(property_A, 'specific_A', o.score1, 0), SUM(o.score2)
FROM o
GROUP BY property_B;

or with case

SELECT o.property_B, , SUM(CASE
                            WHEN property_A = 'specific_A' THEN o.score1 
                            ELSE 0 
                            END ), 
SUM(o.score2)
FROM o
GROUP BY property_B;
like image 23
a.j. tawleed Avatar answered Oct 23 '22 04:10

a.j. tawleed


I do not see why you would need the join at all. I would assume that

SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;

should give what you want, but with a much simpler and hence better to optimize statement.

like image 1
FrankPl Avatar answered Oct 23 '22 05:10

FrankPl