Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL performance optimization: order by datetime field

I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postings table (simplified):

+---------------------+--------------+------+-----+---------+----------------+ | Field               | Type         | Null | Key | Default | Extra          | +---------------------+--------------+------+-----+---------+----------------+ | id                  | int(11)      | NO   | PRI | NULL    | auto_increment | | feed_id             | int(11)      | NO   | MUL | NULL    |                | | crawl_date          | datetime     | NO   |     | NULL    |                | | is_active           | tinyint(1)   | NO   | MUL | 0       |                | | link                | varchar(255) | NO   | MUL | NULL    |                | | author              | varchar(255) | NO   |     | NULL    |                | | title               | varchar(255) | NO   |     | NULL    |                | | excerpt             | text         | NO   |     | NULL    |                | | long_excerpt        | text         | NO   |     | NULL    |                | | user_offtopic_count | int(11)      | NO   | MUL | 0       |                | +---------------------+--------------+------+-----+---------+----------------+ 

And here's the feed table:

+-------------+--------------+------+-----+---------+----------------+ | Field       | Type         | Null | Key | Default | Extra          | +-------------+--------------+------+-----+---------+----------------+ | id          | int(11)      | NO   | PRI | NULL    | auto_increment | | type        | int(11)      | NO   | MUL | 0       |                | | title       | varchar(255) | NO   |     | NULL    |                | | website     | varchar(255) | NO   |     | NULL    |                | | url         | varchar(255) | NO   |     | NULL    |                | +-------------+--------------+------+-----+---------+----------------+ 

And here's the query that takes >1 second to execute. Please note that the post_date field has an index, but MySQL isn't using it to sort the postings table:

SELECT      `postings`.`id`,      UNIX_TIMESTAMP(postings.post_date) as post_date,      `postings`.`link`,      `postings`.`title`,      `postings`.`author`,      `postings`.`excerpt`,      `postings`.`long_excerpt`,      `feeds`.`title` AS feed_title,      `feeds`.`website` AS feed_website FROM      (`postings`) JOIN      `feeds`  ON      `feeds`.`id` = `postings`.`feed_id` WHERE      `feeds`.`type` = 1 AND      `postings`.`user_offtopic_count` < 10 AND      `postings`.`is_active` = 1 ORDER BY      `postings`.`post_date` desc LIMIT      15   

The result of the explain extended command on this query shows that MySQL is using filesort:

+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ | id | select_type | table    | type   | possible_keys                         | key       | key_len | ref                      | rows  | Extra                       | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ |  1 | SIMPLE      | postings | ref    | feed_id,is_active,user_offtopic_count | is_active | 1       | const                    | 30996 | Using where; Using filesort | |  1 | SIMPLE      | feeds    | eq_ref | PRIMARY,type                          | PRIMARY   | 4       | feedian.postings.feed_id |     1 | Using where                 | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ 

When I remove the order by part, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.

like image 837
Dennis G. Avatar asked Apr 03 '09 17:04

Dennis G.


1 Answers

Create a composite index either on postings (is_active, post_date) (in that order).

It will be used both for filtering on is_active and ordering by post_date.

MySQL should show REF access method over this index in EXPLAIN EXTENDED.

Note that you have a RANGE filtering condition over user_offtopic_count, that's why you cannot use an index over this field both in filtering and in sorting by other field.

Depending on how selective is your user_offtopic_count (i. e. how many rows satisfy user_offtopic_count < 10), it may be more useful to create an index on user_offtopic_count and let the post_dates be sorted.

To do this, create a composite index on postings (is_active, user_offtopic_count) and make sure the RANGE access method over this index is used.

Which index will be faster depends on your data distribuion. Create both indexes, FORCE them and see which is faster:

CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count); CREATE INDEX ix_active_date ON postings (is_active, post_date);  SELECT      `postings`.`id`,      UNIX_TIMESTAMP(postings.post_date) as post_date,      `postings`.`link`,      `postings`.`title`,      `postings`.`author`,      `postings`.`excerpt`,      `postings`.`long_excerpt`,      `feeds`.`title` AS feed_title,      `feeds`.`website` AS feed_website FROM      `postings` FORCE INDEX (ix_active_offtopic) JOIN      `feeds`  ON      `feeds`.`id` = `postings`.`feed_id` WHERE      `feeds`.`type` = 1 AND      `postings`.`user_offtopic_count` < 10 AND      `postings`.`is_active` = 1 ORDER BY      `postings`.`post_date` desc LIMIT      15  /* This should show RANGE access with few rows and keep the FILESORT */  SELECT      `postings`.`id`,      UNIX_TIMESTAMP(postings.post_date) as post_date,      `postings`.`link`,      `postings`.`title`,      `postings`.`author`,      `postings`.`excerpt`,      `postings`.`long_excerpt`,      `feeds`.`title` AS feed_title,      `feeds`.`website` AS feed_website FROM      `postings` FORCE INDEX (ix_active_date) JOIN      `feeds`  ON      `feeds`.`id` = `postings`.`feed_id` WHERE      `feeds`.`type` = 1 AND      `postings`.`user_offtopic_count` < 10 AND      `postings`.`is_active` = 1 ORDER BY      `postings`.`post_date` desc LIMIT      15  /* This should show REF access with lots of rows and no FILESORT */ 
like image 194
Quassnoi Avatar answered Sep 28 '22 09:09

Quassnoi