Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't MySQL use the primary key on JOIN plus ORDER?

Here's a neat one for you (MySQL, obviously):

# Setting things up
DROP DATABASE IF EXISTS index_test_gutza;
CREATE DATABASE index_test_gutza;
USE index_test_gutza;

CREATE TABLE customer_order (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY(id)
);
INSERT INTO customer_order
    (id, invoice)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

CREATE TABLE customer_invoice (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice_no MEDIUMINT UNSIGNED DEFAULT NULL,
    invoice_pdf LONGBLOB,
    PRIMARY KEY(id)
);
INSERT INTO customer_invoice
    (id, invoice_no)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

# Ok, here's the beef
EXPLAIN
    SELECT co.id
    FROM customer_order AS co;

EXPLAIN
    SELECT co.id
    FROM customer_order AS co
    ORDER BY co.id;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice
    ORDER BY co.id;

There are four EXPLAIN statements at the bottom. The first two result in exactly what you'd expect:

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | co    | index | NULL          | PRIMARY | 3       | NULL |    5 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

The third one is already interesting -- notice how the primary key in customer_order is not used any more:

+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 |             |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+

The fourth one, however, is the zinger -- simply adding the ORDER BY on the primary key leads to filesort on customer_order (which is to be expected, given that it was already bewildered above):

+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 | Using filesort |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index    |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+

Filesort! And that while I'm never using anything except the primary key in the customer_order table for ordering, and the primary key in the customer_invoice table for the JOIN. So then, in the name of all that is good and right, why does it switch to filesort all of a sudden?! And more importantly, how do I avoid this? For the record, I will gladly accept a documented answer explaining why this cannot be avoided (if that's the case.)

As you probably suspect by now, this is actually happening in production, and although the tables are by no means huge (only hundreds of records), the filesort on the invoice table (which contains a PDF file) is killing the server when I run queries similar to the one above (which I need in order to know which orders have been issued invoices, and which weren't).

Before you ask, I designed the database, and I thought I was safe storing the PDF files in that table because I never ever need any search queries on it -- I always have its primary key at hand!

Update (comments synopsis)

Here's a synopsis of what was suggested in the comments below, so you don't have to read all of that:

  • *You should add a key on customer_order.invoice* -- I actually tried that in production, it makes no difference (as it shouldn't)
  • You should use USE INDEX -- tried that, didn't work. I also tried FORCE INDEX -- no result either (no change whatsoever)
  • You oversimplified the use case, we need the actual production query -- I might have stripped it a bit too much in the first iteration, so I updated it (I just added , ci.invoice_no in the SELECT for the last couple of queries). For the record, if anybody's really curious, here's the production query, exactly as it is (this retrieves the last page of orders):
SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id!="" as A,
    corder.payment_received as pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id=corder.buyer
LEFT JOIN invoice as invoice ON invoice.id=corder.invoice
LEFT JOIN invoice as proforma ON proforma.id=corder.proforma
ORDER BY
    id DESC 
LIMIT 400, 20;

The query above (which, again, is exactly what I run in production) takes about 14 seconds to run. Here's the simplified query, as shown in the use case above, executed on production:

SELECT
    corder.id,
    invoice.invoice_no
FROM
    corder
LEFT JOIN invoice ON invoice.id=corder.invoice
ORDER BY
    corder.id DESC 
LIMIT 400, 20;

This one takes 13 seconds to run. Be advised the LIMIT makes no difference whatsoever as long as we're talking about the last page of results (which we are). That is, there's absolutely no significant difference between retrieving the last 12 results or all 412 results when filesort is involved.

Conclusion

ypercube's answer is not only correct, but unfortunately it seems to be the only legitimate one. I tried to further separate conditions from fields, because the SELECT * FROM corder subquery can end up involving a lot of data, if corder itself contains LONGBLOBs (and duplicating the fields from the main query in the subquery is inelegant), but unfortunately it doesn't seem to work:

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
WHERE corder.id IN (
    SELECT id
    FROM corder
    ORDER BY id DESC
    LIMIT 400,20
)
ORDER BY
    corder.id DESC;

This fails, with the following error message:

ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

I'm using MySQL 5.1.61, which is reasonably recent in the 5.1 family (and apparently this is not supported in 5.5.x either).

like image 539
Bogdan Stăncescu Avatar asked Nov 01 '12 14:11

Bogdan Stăncescu


1 Answers

Can you try this version (it basically gets first the 420 rows of the corder table, keeps the 20 of them and then does the 3 outer joins):

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    ( SELECT * 
      FROM corder
      ORDER BY
        id DESC 
      LIMIT 400, 20
    )
    AS corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
ORDER BY
    corder.id DESC ;
like image 67
ypercubeᵀᴹ Avatar answered Sep 22 '22 09:09

ypercubeᵀᴹ