Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

query with LEFT JOIN and ORDER BY...LIMIT slow, uses Filesort

I have the following query:

SELECT 
    fruit.date,
    fruit.name,
    fruit.reason,
    fruit.id,
    fruit.notes,
    food.name
FROM
    fruit
 LEFT JOIN
    food_fruits AS ff ON fruit.fruit_id = ff.fruit_id AND ff.type='fruit'
 LEFT JOIN
    food USING (food_id)
 LEFT JOIN
    fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE
    (fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
        AND (fruit.`status` = 'Rotten')
        AND (fruit.location = 'USA')
        AND (fruit.size = 'medium')
        AND (fs.fruit_id IS NULL)
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0

And all the indexes you could ever want, including the following which are being used:

fruit        - fruit_filter (size, status, location, date)
food_fruits  - food_type (type)
food         - food (id)
fruits_sour  - fruit_id (fruit_id)

I even have indexes which I thought would work better which are not being used:

food_fruits  - fruit_key (fruit_id, type)
food         - id_name (food_id, name)

The ORDER BY clause is causing a temporary table and filesort to be used, unfortunately. Without that, the query runs lickety-split. How can I get this query to not need to filesort? What am I missing?

EDIT:

The Explain: The Explain

like image 299
MirroredFate Avatar asked Oct 19 '22 21:10

MirroredFate


2 Answers

The reason for this is your ORDER BY clause which is done on the field which is not part of index used for this query. The engine can run the query using the fruit_filter index, but then it has to sort on the different field, and that's when filesort comes into play (which basically means "sort without using index", thanks to the reminder in comments).

I don't know what times you are getting as a result, but if the difference is a lot, then I would create a temporary table with intermediate results and sorted it afterwards.

(By the way, i am not sure why do you use LEFT JOIN instead of INNER JOIN and why do you use food_fruits - answered in comments)

UPDATE.

Try subquery approach, may be (untested), which splits sorting from pre-filtering:

SELECT
    fr.date,
    fr.name,
    fr.reason,
    fr.id,
    fr.notes,
    food.name
FROM
  (
  SELECT 
    fruit.date,
    fruit.name,
    fruit.reason,
    fruit.id,
    fruit.notes,
  FROM
    fruit
  LEFT JOIN
    fruits_sour AS fs ON fruit.id = fs.fruit_id
  WHERE
    (fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
        AND (fruit.`status` = 'Rotten')
        AND (fruit.location = 'USA')
        AND (fruit.size = 'medium')
        AND (fs.fruit_id IS NULL)
  ) as fr
LEFT JOIN
    food_fruits AS ff ON fr.fruit_id = ff.fruit_id AND ff.type='fruit'
LEFT JOIN
    food USING (food_id)  
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0
like image 101
Ashalynd Avatar answered Oct 22 '22 23:10

Ashalynd


Your ORDER BY ... LIMIT clauses require some sorting, you know. The trick to optimizing performance is to ORDER BY ... LIMIT the minimal set of columns, and then build your full result set based on the chosen fifteen rows. So let's try for a minimal set of columns in a subquery.

     SELECT fruit.id,
            food.name
       FROM fruit
  LEFT JOIN food_fruits AS ff   ON fruit.fruit_id = ff.fruit_id 
                               AND ff.type='fruit'
  LEFT JOIN food USING (food_id)
  LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
      WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
        AND fruit.`status` = 'Rotten'
        AND fruit.location = 'USA'
        AND fruit.size = 'medium'
        AND fs.fruit_id IS NULL
   ORDER BY food.name ASC
      LIMIT 15 OFFSET 0

This query gives you the fifteen top ids and their names.

I would add id to the end of your existing fruit_filter index to give (size, status, location, date, id). That will make it into a compound covering index, and allow your filtering query to be satisfied entirely from the index.

Other than that, it's going to be hard to optimize this using more or different indexes because so much of the query is driven by other factors, like the LEFT JOIN ... IS NULL join-fail criterion you have applied.

Then you can join this subquery to your fruits table to pull the full result set.

That will look like this when it's all done.

SELECT fruit.date,
       fruit.name,
       fruit.reason,
       fruit.id,
       fruit.notes,
       list.name
  FROM fruit
  JOIN (
               SELECT fruit.id,
                      food.name
                 FROM fruit
            LEFT JOIN food_fruits AS ff    ON fruit.fruit_id = ff.fruit_id
                                          AND ff.type='fruit'
            LEFT JOIN food USING (food_id)
            LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
                WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
                  AND fruit.`status` = 'Rotten'
                  AND fruit.location = 'USA'
                  AND fruit.size = 'medium'
                  AND fs.fruit_id IS NULL
             ORDER BY food.name ASC
                LIMIT 15 OFFSET 0
       ) AS list ON fruit.id = list.id
 ORDER BY list.name

Do you see how this goes? In the subquery you sling around just enough data to identify which tiny subset of rows you want to retrieve. Then, you join that subquery to your main table to pull out all your data. Limiting the row length in the stuff you have to sort helps performance because MySQL can sort it its sort buffer, rather than having to do a more elaborate and slower sort / merge operation. (But, you can't tell from EXPLAIN whether it will do this or not.)

like image 20
O. Jones Avatar answered Oct 22 '22 23:10

O. Jones