I've tested the following seemingly simple query on MySQL 5.0, 5.1, 5.5 and found it to be extremely slow.
select * from entry where session_id in
(select session_id from entry where created_at > [some timestamp])
Multiple entry's can have the same session ID, but different created_at timestamps. The query is meant to grab all entry's that have at least one entry from the same session_id whose created_at is greater than the specified timestamp.
I've seen others speak of MySQL subquery performance issues with similar queries, and that MySQL considers the subquery a dependent query and it is doing a full table scan on the outer query. Suggested workarounds were something like:
select * from entry where session_id in
(select session_id from
(select session_id from entry where created_at > [some timestamp])
as temp)
However, this hack doesn't work for me and makes it even slower.
Any ideas on how to rewrite this query?
Solution: Use a SubqueryTo speed this up, we suggested they rewrite the query to use a new feature of CrateDB: subquery expressions. In this case, the IN (SUBQUERY) expression. This query optimization approach works well when one of the two joined tables has a relatively small number of rows.
For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL .
I was having a problem with the double subquery trick too, btw I just found out that using this worked for me (based on your query):
select * from entry where session_id in
(select (select session_id from entry where created_at > [some timestamp]))
In my case the original query could work for hours using a join or the "normal" double subquery trick, with the modified double subquery it took 0 secs :)
Depending on your data distribution, use this
SELECT e.*
FROM (
SELECT session_id, MAX(created_at)
FROM entry
GROUP BY
session_id
HAVING MAX(created_at) > $mytimestamp
) ed
JOIN entry e
ON e.session_id = ed.session_id
(create an index on (session_id, created_at)
), or this:
SELECT DISTINCT e.*
FROM entry ed
JOIN entry e
ON e.session_id = ed.session_id
WHERE ed.created_at > $mytimestamp
(create two separate indexes on created_at
and session_id
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With