Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL subquery really slow... Workaround?

I've tested the following seemingly simple query on MySQL 5.0, 5.1, 5.5 and found it to be extremely slow.

select * from entry where session_id in
    (select session_id from entry where created_at > [some timestamp])

Multiple entry's can have the same session ID, but different created_at timestamps. The query is meant to grab all entry's that have at least one entry from the same session_id whose created_at is greater than the specified timestamp.

I've seen others speak of MySQL subquery performance issues with similar queries, and that MySQL considers the subquery a dependent query and it is doing a full table scan on the outer query. Suggested workarounds were something like:

select * from entry where session_id in
    (select session_id from
        (select session_id from entry where created_at > [some timestamp])
    as temp)

However, this hack doesn't work for me and makes it even slower.

Any ideas on how to rewrite this query?

like image 783
n00b Avatar asked May 13 '11 21:05

n00b


People also ask

How do I make subquery faster?

Solution: Use a SubqueryTo speed this up, we suggested they rewrite the query to use a new feature of CrateDB: subquery expressions. In this case, the IN (SUBQUERY) expression. This query optimization approach works well when one of the two joined tables has a relatively small number of rows.

Why is subquery slow?

For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL .


2 Answers

I was having a problem with the double subquery trick too, btw I just found out that using this worked for me (based on your query):

select * from entry where session_id in
    (select (select session_id from entry where created_at > [some timestamp]))

In my case the original query could work for hours using a join or the "normal" double subquery trick, with the modified double subquery it took 0 secs :)

like image 31
r4dius Avatar answered Sep 19 '22 06:09

r4dius


Depending on your data distribution, use this

SELECT  e.*
FROM    (
        SELECT  session_id, MAX(created_at)
        FROM    entry
        GROUP BY
                session_id
        HAVING  MAX(created_at) > $mytimestamp
        ) ed
JOIN    entry e
ON      e.session_id = ed.session_id

(create an index on (session_id, created_at)), or this:

SELECT  DISTINCT e.*
FROM    entry ed
JOIN    entry e
ON      e.session_id = ed.session_id
WHERE   ed.created_at > $mytimestamp

(create two separate indexes on created_at and session_id)

like image 193
Quassnoi Avatar answered Sep 20 '22 06:09

Quassnoi