I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postings
table (simplified):
+---------------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | feed_id | int(11) | NO | MUL | NULL | | | crawl_date | datetime | NO | | NULL | | | is_active | tinyint(1) | NO | MUL | 0 | | | link | varchar(255) | NO | MUL | NULL | | | author | varchar(255) | NO | | NULL | | | title | varchar(255) | NO | | NULL | | | excerpt | text | NO | | NULL | | | long_excerpt | text | NO | | NULL | | | user_offtopic_count | int(11) | NO | MUL | 0 | | +---------------------+--------------+------+-----+---------+----------------+
And here's the feed
table:
+-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | type | int(11) | NO | MUL | 0 | | | title | varchar(255) | NO | | NULL | | | website | varchar(255) | NO | | NULL | | | url | varchar(255) | NO | | NULL | | +-------------+--------------+------+-----+---------+----------------+
And here's the query that takes >1 second to execute. Please note that the post_date
field has an index, but MySQL isn't using it to sort the postings table:
SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM (`postings`) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15
The result of the explain extended
command on this query shows that MySQL is using filesort:
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+ | 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort | | 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where | +----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
When I remove the order by
part, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.
Create a composite index either on postings (is_active, post_date)
(in that order).
It will be used both for filtering on is_active
and ordering by post_date
.
MySQL
should show REF
access method over this index in EXPLAIN EXTENDED
.
Note that you have a RANGE
filtering condition over user_offtopic_count
, that's why you cannot use an index over this field both in filtering and in sorting by other field.
Depending on how selective is your user_offtopic_count
(i. e. how many rows satisfy user_offtopic_count < 10
), it may be more useful to create an index on user_offtopic_count
and let the post_dates be sorted.
To do this, create a composite index on postings (is_active, user_offtopic_count)
and make sure the RANGE
access method over this index is used.
Which index will be faster depends on your data distribuion. Create both indexes, FORCE
them and see which is faster:
CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count); CREATE INDEX ix_active_date ON postings (is_active, post_date); SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_offtopic) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show RANGE access with few rows and keep the FILESORT */ SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_date) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show REF access with lots of rows and no FILESORT */
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With