Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL not using index with JOIN, WHERE and ORDER

We have two tables resembling a simple tag-record structure as follows (in reality it's much more complex but this is the essence of the problem):

tag (A.a) | recordId (A.b)
1         | 1
2         | 1
2         | 2
3         | 2
....

and

recordId (B.b) | recordData (B.c)
1              | 123
2              | 666
3              | 1246

The problem is fetching ordered records with a specific tag. The obvious way of doing it is with a simple join and indexes on (PK)(A.a, A.b), (A.b), (PK)(B.b), (B.b,B.c) as such:

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

However, this gives the unpleasant result of a filesort:

+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref       | rows | Extra                                        |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
|  1 | SIMPLE      | A     | ref  | PRIMARY,b     | PRIMARY | 4       | const     |   94 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | B     | ref  | PRIMARY,b     | b       | 4       | booli.A.b |    1 | Using index                                  | 
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+

Using a huge and extremely redundant "materialized view" we can get pretty decent performance but this at the expense of complicating the business-logic, something we would like to avoid, especially since the A and B tables already are MV:s (and are needed for other queries, and infact the same queries using a UNION).

create temporary table C engine=innodb as (select A.a, A.b, B.c from A join B on A.b = B.b);
explain select a, b, c from C where a = 44 order by c;

Further complicating the situation is the fact that we have conditionals on the B-table, such as range-filters.

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 AND B.c > 678 order by c;

But we are confident we can handle this if the filesort problem goes away.

Does anyone know why the simple join in codeblock 3 above won't use the index for sorting and if we can get around the problem in some way without creating a new MV?

Below is the full SQL listing that we are using for testing.

DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
CREATE TEMPORARY TABLE A (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY(a, b), INDEX idx_A_b (b)) ENGINE=INNODB;
CREATE TEMPORARY TABLE B (b INT NOT NULL, c INT NOT NULL, d VARCHAR(5000) NOT NULL DEFAULT '', PRIMARY KEY(b), INDEX idx_B_c (c), INDEX idx_B_b (b, c)) ENGINE=INNODB;

DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT IGNORE INTO A SELECT RAND()*100, RAND()*10000;
                INSERT IGNORE INTO B SELECT RAND()*10000, RAND()*1000, '';
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$
DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;
DROP PROCEDURE prc_filler;

CREATE TEMPORARY TABLE C ENGINE=INNODB AS (SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b);
ALTER TABLE C ADD (PRIMARY KEY(a, b), INDEX idx_C_a_c (a, c));

EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE 1 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b where A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT a, b, c FROM C WHERE a = 44 ORDER BY c;
-- Added after Quassnois comments
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
like image 841
Paso Avatar asked Aug 04 '09 13:08

Paso


2 Answers

When I try to reproduce this query using your scripts:

SELECT  A.a, A.b, B.c
FROM    A
JOIN    B
ON      A.b = B.b
WHERE   a = 44
ORDER BY
        c

, it completes in 0.0043 seconds (instantly), returns 930 rows and yields this plan:

1, 'SIMPLE', 'A', 'ref', 'PRIMARY', 'PRIMARY', '4', 'const', 1610, 'Using index; Using temporary; Using filesort'
1, 'SIMPLE', 'B', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.A.b', 1, ''

It's quite efficient for such a query.

For such a query, you cannot use a single index both for filtering and sorting.

See this article in my blog for more detailed explanations:

  • Choosing index

If you expect your query to return few records, you should use the index on A for filtering and then sort using filesort (like the query above does).

If you expect it to return many records (and LIMIT them), you need to use index for sorting and then filter:

CREATE INDEX ix_a_b ON a (b);
CREATE INDEX ix_b_c ON b (c)

SELECT  *
FROM    B FORCE INDEX (ix_b_c)
JOIN    A
ON      A.b = B.b
ORDER BY
        b.c
LIMIT 10;

1, 'SIMPLE', 'B', 'index', '', 'ix_b_c', '4', '', 2, 'Using index'
1, 'SIMPLE', 'A', 'ref', 'ix_a_b', 'ix_a_b', '4', 'test.B.b', 4, 'Using index'
like image 193
Quassnoi Avatar answered Oct 17 '22 07:10

Quassnoi


select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

If you alias the columns, does that help? Example:

 SELECT 
 T1.a AS colA, 
 T2.b AS colB, 
 T2.c AS colC 
 FROM A AS T1 
 JOIN B AS T2 
 ON (T1.b = T2.b) 
 WHERE 
 T1.a = 44 
 ORDER BY colC;

The only changes I made were:

  • I put the join conditions in parenthesis
  • The join conditions and where conditions are based on table columns
  • The ORDER BY condition is based on the resulting table column
  • I aliased the result table columns and the queried tables to (hopefully) make it more clear when I was using one or the other (and more clear to the server. You neglect to refer to your columns in two places in your original query).

I know your real data is more complex, but I assume that you provided a simple version of the query because the problem is at that simple level.

like image 35
Anthony Avatar answered Oct 17 '22 09:10

Anthony