About query optimizations, I'm wondering if statements like one below get optimized:
select *
from (
select *
from table1 t1
join table2 t2 using (entity_id)
order by t2.sort_order, t1.name
) as foo -- main query of object
where foo.name = ?; -- inserted
Consider that the query is taken care by a dependency object but just (rightly?) allows one to tack in a WHERE condition. I'm thinking that at least not a lot of data gets pulled in to your favorite language, but I'm having second thoughts if that's an adequate optimization and maybe the database is still taking some time going through the query.
Or is it better to take that query out and write a separate query method that has the where and maybe a LIMIT 1 clause, too?
In MySQL, no.
The predicate in an outer query does not get "pushed" down into the inline view query.
The query in the inline view is processed first, independent of the outer query. (MySQL will optimize that view query just like it would optimize that query if you submitted that separately.)
The way that MySQL processes this query: the inline view query gets run first, the result is materialized as a 'derived table'. That is, the result set from that query gets stored as a temporary table, in memory in some cases (if it's small enough, and doesn't contain any columns that aren't supported by the MEMORY engine. Otherwise, it's spun out to disk with as a MyISAM table, using the MyISAM storage engine.
Once the derived table is populated, then the outer query runs.
(Note that the derived table does not have any indexes on it. That's true in MySQL versions before 5.6; I think there are some improvements in 5.6 where MySQL will actually create an index.
Clarification: indexes on derived tables: As of MySQL 5.6.3 "During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Reference: http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
Also, I don't think MySQL "optimizes out" any unneeded columns from the inline view. If the inline view query is a SELECT *, then all of the columns will be represented in the derived table, whether those are referenced in the outer query or not.
This can lead to some significant performance issues, especially when we don't understand how MySQL processes a statement. (And the way that MySQL processes a statement is significantly different from other relational databases, like Oracle and SQL Server.)
You may have heard a recommendation to "avoid using views in MySQL". The reasoning behind this general advice (which applies to both "stored" views and "inline" views) is the significant performance issues that can be unnecessarily introduced.
As an example, for this query:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
) q
WHERE q.id = 42
MySQL does not "push" the predicate id=42 down into the view definition. MySQL first runs the inline view query, and essentially creates a copy of huge_table, as an un-indexed MyISAM table. Once that is done, then the outer query will scan the copy of the table, to locate the rows satisfying the predicate.
If we instead re-write the query to "push" the predicate into the view definition, like this:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
WHERE h.id = 42
) q
We expect a much smaller resultset to be returned from the view query, and the derived table should be much smaller. MySQL will also be able to make effective use of an index ON huge_table (id). But there's still some overhead associated with materializing the derived table.
If we eliminate the unnecessary columns from the view definition, that can be more efficient (especially if there are a lot of columns, there are any large columns, or any columns with datatypes not supported by the MEMORY engine):
SELECT q.name
FROM ( SELECT h.name
FROM huge_table h
WHERE h.id = 42
) q
And it would be even more efficient to eliminate the inline view entirely:
SELECT q.name
FROM huge_table q
WHERE q.id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With