Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL subquery and temporary table is slow

Tags:

mysql

subquery

I wanted to optimize the following query:

SELECT SQL_NO_CACHE t.topic_id
FROM bb_topics t, bb_posters ps
WHERE t.topic_id = ps.topic_id
AND forum_id IN (2, 6, 7, 10, 15, 20)
ORDER BY ps.timestamp desc
LIMIT 20

Query took 0.1475 sec

So at first I replaced WHERE IN with an INNER JOIN subquery:

SELECT SQL_NO_CACHE t.topic_id
FROM ( SELECT * FROM bb_topics WHERE forum_id IN (2, 6, 7, 10, 15, 20) ) t
INNER JOIN bb_posters ps ON t.topic_id = ps.topic_id
ORDER BY ps.timestamp desc
LIMIT 20

Query took 0.1541 sec

Then I tried to create a temporary table:

CREATE TEMPORARY TABLE IF NOT EXISTS bb_topics_tmp ( INDEX(topic_id) )
ENGINE=MEMORY
AS ( SELECT * FROM bb_topics WHERE forum_id IN (2, 6, 7, 10, 15, 20) );

SELECT SQL_NO_CACHE t.topic_id
FROM bb_topics_tmp t, bb_posters ps
AND t.topic_id = ps.topic_id
ORDER BY ps.timestamp desc
LIMIT 20

Query took 0.1467 sec

I don't understand why selecting from a complete table with 38,522 rows is much faster than from a temporary one with 9,943 rows:

SELECT SQL_NO_CACHE t.topic_id
FROM bb_topics t, bb_posters ps
WHERE t.topic_id = ps.topic_id
ORDER BY ps.timestamp desc
LIMIT 20

Query took 0.0006 sec

Both topic_id and timestamp have indexes.

Funny thing is that even using something like this is much faster than the list of forums:

AND pt.post_text LIKE '%searchterm%'

UPD:

Here is the output of EXPLAIN:

SELECT SQL_NO_CACHE t.topic_id, t.topic_title, ps.timestamp, u.username,
u.user_id, ps.size, ps.downloaded, ROUND(a.rating_sum/a.rating_count) AS Rating,
a.attach_id, pt.bbcode_uid, pt.post_text
FROM bb_topics t
JOIN bb_posters ps ON ps.topic_id = t.topic_id
LEFT JOIN bb_users u ON u.user_id = t.topic_poster
LEFT JOIN bb_posts_text pt ON pt.post_id = bt.post_id
LEFT JOIN bb_attachments_desc a ON bt.attach_id = a.attach_id
WHERE t.forum_id IN (2, 6, 7, 10, 15, 20)
ORDER BY ps.timestamp desc
LIMIT 1, 20

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  t   range   PRIMARY,forum_id    forum_id    2   NULL    8379    Using where; Using temporary; Using filesort
1   SIMPLE  ps  eq_ref  topic_id    topic_id    3   DB.t.topic_id       1    
1   SIMPLE  u   eq_ref  PRIMARY     PRIMARY     3   DB.t.topic_poster   1   Using index
1   SIMPLE  pt  eq_ref  PRIMARY     PRIMARY     3   DB.bt.post_id       1   Using index
1   SIMPLE  a   eq_ref  PRIMARY     PRIMARY     3   DB.bt.attach_id     1   Using index

Query took 0.8527 sec

The same query without WHERE t.forum_id IN:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  ps  index   topic_id    timestamp   4   NULL                21   
1   SIMPLE  t   eq_ref  PRIMARY     PRIMARY     3   DB.bt.topic_id      1    
1   SIMPLE  u   eq_ref  PRIMARY     PRIMARY     3   DB.t.topic_poster   1    
1   SIMPLE  pt  eq_ref  PRIMARY     PRIMARY     3   DB.bt.post_id       1    
1   SIMPLE  a   eq_ref  PRIMARY     PRIMARY     3   DB.bt.attach_id     1    

Query took 0.0022 sec

UPD 2:

Adding USE INDEX (timestamp) solved the problem:

SELECT SQL_NO_CACHE t.topic_id, t.topic_title, ps.timestamp, u.username,
u.user_id, ps.size, ps.downloaded, ROUND(a.rating_sum/a.rating_count) AS Rating,
a.attach_id, pt.bbcode_uid, pt.post_text
FROM bb_topics t
JOIN bb_posters ps USE INDEX (timestamp) ON ps.topic_id = t.topic_id
LEFT JOIN bb_users u ON u.user_id = t.topic_poster
LEFT JOIN bb_posts_text pt ON pt.post_id = bt.post_id
LEFT JOIN bb_attachments_desc a ON bt.attach_id = a.attach_id
WHERE t.forum_id IN (2, 6, 7, 10, 15, 20)
ORDER BY ps.timestamp desc
LIMIT 1, 20

Query took 0.0023 sec
like image 263
amq Avatar asked Oct 21 '22 12:10

amq


1 Answers

These aren't terribly difficult queries. You're doing the right thing by using SQL_NO_CACHE, and by timing them. But you also need to look at the results of EXPLAIN.

Use JOIN syntax instead of comma-separated table lists. The queries should be equivalent, but the old style syntax is harder to understand.

SELECT SQL_NO_CACHE 
       t.topic_id
  FROM bb_topics  AS t
  JOIN bb_posters AS ps ON t.topic_id = ps.topic_id
 WHERE t.forum_id IN (2, 6, 7, 10, 15, 20)
 ORDER BY ps.timestamp desc
 LIMIT 20

Try using some composite (multicolumn) covering indexes to get your performance to the next level.

You need to order the bb_posters table by timestamp, and you need the topic_id. So try this index: (timestamp, topic_id) If you can use a statement like

    WHERE ps.timestamp >= DATE(NOW()) - INTERVAL 7 DAY

to limit the timeframe of the search, it will help performance even more.

You need topic_id and forum_id from your bb_topics table. So try this index (topic_id, forum_id)

You can use similar composite covering indexes for the other tables you're trying to join.

If your tables are well-indexed, queries on them should be just as efficient as queries on temporary tables. Creating temporary tables tends to do things to the server like flush out table data cached in RAM, which has an unintended negative effect on performance.

like image 123
O. Jones Avatar answered Oct 27 '22 18:10

O. Jones