I have a table and a query that looks like below. For a working example, see this SQL Fiddle.
SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN
(
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B
) w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;
With my real data, this query takes 27 seconds. However, if I first create w as a temporary Table and index property_B, it all together takes ~1 second.
CREATE TEMPORARY TABLE w AS
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B;
ALTER TABLE w ADD INDEX `property_B_idx` (property_B);
SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;
DROP TABLE IF EXISTS w;
Is there a way to combine the best of these two queries? I.e. a single query with the speed advantages of the indexing in the subquery?
After Mehran's answer below, I read this piece of explanation in the MySQL documentation:
As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):
...
For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution. Consider the following query:
SELECT * FROM t1
JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;
The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a usual table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If ref access would result in higher cost than some other access method, no index is created and the optimizer loses nothing.
For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution.
You cannot include text, unitext, or image datatypes in subqueries. Subqueries cannot manipulate their results internally, that is, a subquery cannot include the order by clause, the compute clause, or the into keyword.
A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.
More formally, it is the use of a SELECT statement inside one of the clauses of another SELECT statement. In fact, a subquery can be contained inside another subquery, which is inside another subquery, and so forth. A subquery can also be nested inside INSERT, UPDATE, and DELETE statements.
First of all you need to know that creating a temporary table is absolutely a feasible solution. But in cases no other choice is applicable which is not true here!
In your case, you can easily boost your query as FrankPl pointed out because your sub-query and main-query are both grouping by the same field. So you don't need any sub-queries. I'm going to copy and paste FrankPl's solution for the sake of completeness:
SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;
Yet it doesn't mean it's impossible to come across a scenario in which you wish you could index a sub-query. In which cases you've got two choices, first is using a temporary table as you pointed out yourself, holding the results of the sub-query. This solution is advantageous since it is supported by MySQL for a long time. It's just not feasible if there's a huge amount of data involved.
The second solution is using MySQL version 5.6 or above. In recent versions of MySQL new algorithms are incorporated so an index defined on a table used within a sub-query can also be used outside of the sub-query.
[UPDATE]
For the edited version of the question I would recommend the following solution:
SELECT o.property_B, SUM(IF(o.property_A = 'specific_A', o.score1, 0)), SUM(o.score2)
FROM o
GROUP BY property_B
HAVING SUM(IF(o.property_A = 'specific_A', o.score1, 0)) > 0;
But you need to work on the HAVING
part. You might need to change it according to your actual problem.
I am not really that familiar with MySql, I mostly worked with Oracle. If you want a where-clause in the SUM, you can use decode or case. it would look something like that
SELECT o.property_B, , SUM(decode(property_A, 'specific_A', o.score1, 0), SUM(o.score2)
FROM o
GROUP BY property_B;
or with case
SELECT o.property_B, , SUM(CASE
WHEN property_A = 'specific_A' THEN o.score1
ELSE 0
END ),
SUM(o.score2)
FROM o
GROUP BY property_B;
I do not see why you would need the join at all. I would assume that
SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;
should give what you want, but with a much simpler and hence better to optimize statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With