Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to optimize MySQL query which uses a ORDER BY clause

I'm using Drupal 6 with MySQL version 5.0.95 and at an impasse where one of my queries which displays content based on most recent article date slows down and because of the frequency of being used kills the site performance altogether. The query in question is as below:

     SELECT n.nid, 
            n.title, 
            ma.field_article_date_format_value, 
            ma.field_article_summary_value
       FROM node n 
 INNER JOIN content_type_article ma ON n.nid=ma.nid
 INNER JOIN term_node tn            ON n.nid=tn.nid 
      WHERE tn.tid= 153 
        AND n.status=1 
   ORDER BY ma.field_article_date_format_value DESC 
      LIMIT 0, 11;

The EXPLAIN of the query shows the below result:

+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys            | key     | key_len | ref                  | rows  | Extra                           |
+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+
|  1 | SIMPLE      | tn    | ref    | PRIMARY,nid              | PRIMARY | 4       | const                | 19006 | Using temporary; Using filesort |
|  1 | SIMPLE      | ma    | ref    | nid,ix_article_date      | nid     | 4       | drupal_mm_stg.tn.nid |     1 |                                 |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,node_status_type | PRIMARY | 4       | drupal_mm_stg.ma.nid |     1 | Using where                     |
+----+-------------+-------+--------+--------------------------+---------+---------+----------------------+-------+---------------------------------+

This query seemed relatively simple and straight forward and retrieves articles which belong to a category (term) 153 and are of status 1 (published). But apparently Using temporary table and Using filesort means the query is bound to fail from what I've learnt browsing about it.

Removing field_article_date_format_value from the ORDER BY clause solves the Using temporary; Using filesort reduces the query execution time but is required and cannot be traded off, unfortunately same holds equally true for the site performance.

My hunch is that most of the trouble comes from the term_node table which maps articles to categories and is a many-many relationship table meaning if article X is associated to 5 categories C1....C5 it will have 5 entries in that table, this table is from out-of-the-box drupal.

Dealing with heavy DB content is something new to me and going through some of the similar queries ( When ordering by date desc, "Using temporary" slows down query, MySQL performance optimization: order by datetime field) I tried to create a composite index for the content_type_article whose datetime field is used in the ORDER BY clause along with another key (nid) in it and tried to FORCE INDEX.

    SELECT n.nid, n.title,
           ma.field_article_date_format_value, 
           ma.field_article_summary_value 
      FROM node n 
INNER JOIN content_type_article ma FORCE INDEX (ix_article_date) ON n.nid=ma.nid 
INNER JOIN term_node tn ON n.nid=tn.nid 
     WHERE tn.tid= 153 
       AND n.status=1 
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;

The result and the following EXPLAIN query did not seem to help much

+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys            | key             | key_len | ref                  | rows  | Extra                           |
+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+
|  1 | SIMPLE      | tn    | ref    | PRIMARY,nid              | PRIMARY         | 4       | const                | 18748 | Using temporary; Using filesort |
|  1 | SIMPLE      | ma    | ref    | ix_article_date          | ix_article_date | 4       | drupal_mm_stg.tn.nid |     1 |                                 |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,node_status_type | PRIMARY         | 4       | drupal_mm_stg.ma.nid |     1 | Using where                     |
+----+-------------+-------+--------+--------------------------+-----------------+---------+----------------------+-------+---------------------------------+

The fields n.nid, ca.nid, ma.field_article_date_format_value are all indexed. Querying the DB with Limit 0,11 takes approximately 7-10 seconds with the ORDER BY clause but without it the query barely takes a second. The database engine is MyISAM. Any help on this would be greatly appreciated.

Any answer that could help me in getting this query like a normal one (at the same speed as a query without sort by date) would be great. My attempts with creating a composite query as a combination of nid and field_article_date_format_value and use in the query did not help the cause. I'm open to providing additional info on the problem and any new suggestions.

like image 430
optimusprime619 Avatar asked Dec 13 '12 00:12

optimusprime619


2 Answers

Taking a look at your query and the explain, it seems like having the n.status=1 in the where clause is making the search very inefficient because you need to return the whole set defined by the joins and then apply the status = 1. Try starting the join from the term_node table that is inmediately filtered by the WHERE and then make the joins adding the status condition immediately. Give it a try and please tell me how it goes.

 SELECT n.nid, n.title,
           ma.field_article_date_format_value, 
           ma.field_article_summary_value 
      FROM term_node tn
INNER JOIN node n ON n.nid=tn.nid AND n.status=1
INNER JOIN content_type_article ma FORCE INDEX (ix_article_date) ON n.nid=ma.nid 
     WHERE tn.tid= 153 
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;
like image 90
Mickle Foretic Avatar answered Sep 25 '22 03:09

Mickle Foretic


Using temporary; Using filesort means only that MySQL needs to construct a temporary result table and sort it to get the result you need. This is often a consequence of the ORDER BY ... DESC LIMIT 0,n construct you're using to get the latest postings. In itself it's not a sign of failure. See this: http://www.mysqlperformanceblog.com/2009/03/05/what-does-using-filesort-mean-in-mysql/

Here are some things to try. I am not totally sure they'll work; it's hard to know without having your data to experiment with.

Is there a BTREE index on content_type_article.field_article_date_format_value ? If so, that may help.

Do you HAVE to display the 11 most recent articles? Or can you display the 11 most recent articles that have appeared in the last week or month? If so you could add this line to your WHERE clause. It would filter your stuff by date rather than having to look all the way back to the beginning of time for matching articles. This will be especially helpful if you have a long-established Drupal site.

   AND ma.field_article_date_format_value >= (CURRENT_TIME() - INTERVAL 1 MONTH)

First, try to flip the order of the INNER JOIN operations. Second, incorporate the tid=153 into the join criterion. This MAY reduce the size of the temp table you need to sort. All together my suggestions are as follows:

    SELECT n.nid, 
           n.title, 
           ma.field_article_date_format_value, 
           ma.field_article_summary_value
      FROM node n 
INNER JOIN term_node tn            ON (n.nid=tn.nid AND tn.tid = 153) 
INNER JOIN content_type_article ma ON n.nid=ma.nid
     WHERE n.status=1 
       AND ma.field_article_date_format_value >= (CURRENT_TIME() - INTERVAL 1 MONTH)
  ORDER BY ma.field_article_date_format_value DESC 
     LIMIT 0, 11;

Those are

like image 40
O. Jones Avatar answered Sep 22 '22 03:09

O. Jones