I have table Foo with 200 million records and table Bar with 1000 records, they are connected many-to-one. There are indexes for columns Foo.someTime and Bar.someField. Also in Bar 900 records have someField of 1, 100 have someField of 2.
(1) This query executes immediately:
mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where f.someTime between '2008-08-14' and '2018-08-14' and b.someField = 1 limit 20;
...
20 rows in set (0.00 sec)
(2) This one takes just forever (the only change is b.someField = 2):
mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where f.someTime between '2008-08-14' and '2018-08-14' and b.someField = 2 limit 20;
(3) But if I drop out where clause on someTime than it also executes immediately:
mysql> select * from Foo f inner join Bar b on f.table_id = b.table_id where b.someField = 2 limit 20;
...
20 rows in set (0.00 sec)
(4) Also I can speed it up by forcing the index usage:
mysql> select * from Foo f inner join Bar b force index(someField) on f.table_id = b.table_id where f.someTime between '2008-08-14' and '2018-08-14' and b.someField = 2 limit 20;
...
20 rows in set (0.00 sec)
Here is the explain on query (2) (which takes forever)
+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| 1 | SIMPLE | g | range | bar_id,bar_id_2,someTime | someTime | 4 | NULL | 95022220 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY,someField,bar_id | PRIMARY | 4 | db.f.bar_id | 1 | Using where |
+----+-------------+-------+--------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
Here is the explain on (4) (which has force index)
+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
| 1 | SIMPLE | t | ref | someField | someField | 1 | const | 92 | |
| 1 | SIMPLE | g | ref | bar_id,bar_id_2,someTime | bar_id | 4 | db.f.foo_id | 10558024 | Using where |
+----+-------------+-------+------+-------------------------------+-----------+---------+--------------------------+----------+-------------+
So the question is how to teach MySQL to use right index? The query is generated by ORM and is not limited to only these two fields. And also it would be nice to avoid changing the query much (though I'm not sure that inner join fits here).
UPDATE:
mysql> create index index_name on Foo (bar_id, someTime);
After that the query (2) executes in 0.00 sec.
Indexes can help improve the performance of a nested-loop join in several ways. The biggest benefit often comes when you have a clustered index on the joining column in one of the tables. The presence of a clustered index on a join column frequently determines which table SQL Server chooses as the inner table.
Because MySQL uses only one index for each table in one execution, we need to decide using index for join or group by. Let give GROUP BY a chance and see what happens. According to the explain chart, MySQL will join `answer` table into `user` table.
The indexes must include all the join columns, in the same key order on both tables. A merge join on (col1, col2) can use indexes on (col1, col2) or (col2, col1) , but the key order must be the same for both tables.
To use the WHERE clause to perform the same join as you perform using the INNER JOIN syntax, enter both the join condition and the additional selection condition in the WHERE clause. The tables to be joined are listed in the FROM clause, separated by commas. This query returns the same output as the previous example.
If you create compound index for foo(table_id, sometime)
, it should help a lot. This is because server will be able to narrow down result set by table_id
first, and then by sometime
.
Note that when using LIMIT
, server does not guarantee which rows will be fetched if many qualify to your WHERE constraint. Technically, every execution can give you slightly different result. If you want to avoid ambiguity, you should always use ORDER BY
when you use LIMIT
. However, that also means you should be more careful in creating appropriate indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With