Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql slow query: INNER JOIN + ORDER BY causes filesort

I'm trying to optimize this query:

SELECT `posts`.* FROM `posts` INNER JOIN `posts_tags` 
     ON `posts`.id = `posts_tags`.post_id 
     WHERE  (((`posts_tags`.tag_id = 1))) 
     ORDER BY posts.created_at DESC;

The size of tables is 38k rows, and 31k and mysql uses "filesort" so it gets pretty slow. I tried to use different indexes, no luck.

CREATE TABLE `posts` (
  `id` int(11) NOT NULL auto_increment,
  `created_at` datetime default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_posts_on_created_at` (`created_at`),
  KEY `for_tags` (`trashed`,`published`,`clan_private`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=44390 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `posts_tags` (
  `id` int(11) NOT NULL auto_increment,
  `post_id` int(11) default NULL,
  `tag_id` int(11) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_posts_tags_on_post_id_and_tag_id` (`post_id`,`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=63175 DEFAULT CHARSET=utf8
+----+-------------+------------+--------+--------------------------+--------------------------+---------+---------------------+-------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys            | key                      | key_len | ref                 | rows  | Extra                                                     |
+----+-------------+------------+--------+--------------------------+--------------------------+---------+---------------------+-------+-----------------------------------------------------------+
|  1 | SIMPLE      | posts_tags | index  | index_post_id_and_tag_id | index_post_id_and_tag_id | 10      | NULL                | 24159 | Using where; Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | posts      | eq_ref | PRIMARY                  | PRIMARY                  | 4       | .posts_tags.post_id |     1 |                                                           | 
+----+-------------+------------+--------+--------------------------+--------------------------+---------+---------------------+-------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

What kind of index I need to define to avoid mysql using filesort? Is it possible when order field is not in where clause?

update: Profiling results:

mysql> show profile for query 1;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000027 | 
| checking query cache for query | 0.037953 | 
| Opening tables                 | 0.000028 | 
| System lock                    | 0.010382 | 
| Table lock                     | 0.023894 | 
| init                           | 0.000057 | 
| optimizing                     | 0.010030 | 
| statistics                     | 0.000026 | 
| preparing                      | 0.000018 | 
| Creating tmp table             | 0.128619 | 
| executing                      | 0.000008 | 
| Copying to tmp table           | 1.819463 | 
| Sorting result                 | 0.001092 | 
| Sending data                   | 0.004239 | 
| end                            | 0.000012 | 
| removing tmp table             | 0.000885 | 
| end                            | 0.000006 | 
| end                            | 0.000005 | 
| query end                      | 0.000006 | 
| storing result in query cache  | 0.000005 | 
| freeing items                  | 0.000021 | 
| closing tables                 | 0.000013 | 
| logging slow query             | 0.000004 | 
| cleaning up                    | 0.000006 | 
+--------------------------------+----------+

update2:

Real query (some more boolean fields, more useless indexes)

SELECT `posts`.* FROM `posts` INNER JOIN `posts_tags` 
   ON `posts`.id = `posts_tags`.post_id 
   WHERE ((`posts_tags`.tag_id = 7971)) 
       AND (((posts.trashed = 0) 
       AND (`posts`.`published` = 1 
       AND `posts`.`clan_private` = 0)) 
       AND ((`posts_tags`.tag_id = 7971)))  
   ORDER BY created_at DESC LIMIT 0, 10; 

Empty set (1.25 sec)

Without ORDER BY — 0.01s.


+----+-------------+------------+--------+-----------------------------------------+-----------------------+---------+---------------------+-------+--------------------------+
| id | select_type | table      | type   | possible_keys                           | key                   | key_len | ref                 | rows  | Extra                    |
+----+-------------+------------+--------+-----------------------------------------+-----------------------+---------+---------------------+-------+--------------------------+
|  1 | SIMPLE      | posts_tags | index  | index_posts_tags_on_post_id_and_tag_id  | index_posts_tags_...  | 10      | NULL                | 23988 | Using where; Using index | 
|  1 | SIMPLE      | posts      | eq_ref | PRIMARY,index_posts_on_trashed_and_crea | PRIMARY               | 4       | .posts_tags.post_id |     1 | Using where              | 
+----+-------------+------------+--------+-----------------------------------------+-----------------------+---------+---------------------+-------+--------------------------+

SOLUTION

  1. Query updated to "ORDER BY posts_tags.created_at DESC" (two small changes in app code)
  2. Index added: index_posts_tags_on_created_at.

That's all!

like image 848
Alexander Avatar asked Jun 10 '10 14:06

Alexander


People also ask

Why do Joins slow down queries?

Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.

Is where clause faster than join?

“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.

Which join is faster in MySQL?

performance - Mysql - LEFT JOIN way faster than INNER JOIN - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.


2 Answers

You would need to denormalize a bit, and copy the posts.created_at field into the post_tags table (I called it post_created_at, you could name it how you want):

CREATE TABLE `posts_tags` (
  `id` int(11) NOT NULL auto_increment,
  `post_id` int(11) default NULL,
  `tag_id` int(11) default NULL,
  `post_created_at` datetime default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_posts_tags_on_post_id_and_tag_id` (`post_id`,`tag_id`)
) ENGINE=InnoDB;

and then add an index to posts_tags on

(tag_id, post_created_at)

That will allow the query to get all the posts for a tag, in the correct order, without filesort.

like image 157
nathan Avatar answered Oct 27 '22 18:10

nathan


Try changing KEY index_posts_tags_on_post_id_and_tag_id (post_id,tag_id) to KEY index_posts_tags_tag_id (tag_id) and repost Explain.

What is the distribution of TagIDs withing Posts_Tags?

like image 35
Gary Avatar answered Oct 27 '22 20:10

Gary