SELECT with query variables not using INDEXes

Question

I was playing around (out of interest) with retrieving a tree of nodes in a simple adjacency list with a recursive query using local variables.

The solution i have so far is fun but i wonder (and this is my only question) why MySQL refuses to use any INDEX to optimize this query. Shouldn't MySQL be able to lookup the nearest child(s) by using an INDEX?

I'm curious why MySQL doesn't. Even when i use FORCE INDEX the execution plan doesn't change.

This is the query so far, with 5 being the ID of the parent node:

SELECT 
  @last_id := id AS id,
  parent_id,
  name,
  @depth := IF(parent_id = 5, 1, @depth + 1) AS depth
FROM 
  tree FORCE INDEX (index_parent_id, PRIMARY, index_both),
  (SELECT @last_id := 5, @depth := -1) vars
WHERE id = 5 OR parent_id = @last_id OR parent_id = 5

Try live example at SQLfiddle

Note that the reason can't be the small dataset, because the behaviour doesn't change when i specify FORCE INDEX (id) or FORCE INDEX (parent_id) or FORCE INDEX (id, parent_id) ...

The docs say:

You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.

There must be something that renders the query unable to use the INDEX, but i don't understand what it is.

Disclaimer: I know there are different ways to store and retrieve hierarchical data in SQL. I know about the nested sets model. I'm not looking for an alternative implementation. I'm not looking for nested sets.

I also know the query in itself is nuts and produces wrong results.

I just want to understand (in detail) why MySQL is not using an INDEX in this case.

Shlomi Noach · Accepted Answer

The reason lies within the use of OR conditions in the WHERE clause.

To illustrate, try running the query again, this time with only the id = 5 condition, and get (EXPLAIN output):

+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+
| id | select_type | table      | type   | possible_keys      | key     | key_len | ref   | rows | Extra          |
+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL               | NULL    | NULL    | NULL  |    1 |                |
|  1 | PRIMARY     | tree       | const  | PRIMARY,index_both | PRIMARY | 4       | const |    1 |                |
|  2 | DERIVED     | NULL       | NULL   | NULL               | NULL    | NULL    | NULL  | NULL | No tables used |
+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+

And again, this time with only the parent_id = @last_id OR parent_id = 5 condition, and get:

+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+
| id | select_type | table      | type   | possible_keys   | key  | key_len | ref  | rows | Extra          |
+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL            | NULL | NULL    | NULL |    1 |                |
|  1 | PRIMARY     | tree       | ALL    | index_parent_id | NULL | NULL    | NULL |   10 | Using where    |
|  2 | DERIVED     | NULL       | NULL   | NULL            | NULL | NULL    | NULL | NULL | No tables used |
+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+

MySQL is not too good with handling multiple indexes in the same query. Things are slightly better with AND conditions; one is more likely to see an index_merge optimization than an index union optimization.

Things are improving as versions advance, but I've tested you query on version 5.5, which is at current latest production version, and the results are as you describe.

To explain why this is difficult, consider: two different indexes will answer for two different conditions of the query. One will answer for id = 5, the other for parent_id = @last_id OR parent_id = 5 (BTW no issue with the OR inside the latter, since both terms are handled from within the same index).

There is no single index that can answer for both, and therefore the FORCE INDEX instruction is ignored. See, FORCE INDEX says MySQL has to use an index over a table scan. It does not imply it has to use more than one index over a table scan.

So MySQL follows the rules of the documentation here. But why is this so complicated? Because to answer using both indexes, MySQL has to gather results from both, store one's aside in some temporary buffer while managing the second. Then is has to go over that buffer to filter out identical rows (it is possible that some row fits all conditions). And then to scan that buffer so as to return the results.

But wait, that buffer is in itself not indexed. Filtering duplicates is not an obvious task. So MySQL prefers to work on the original table and do the scan there, and avoid all that mess.

Of course this is solvable. The engineers at Oracle may yet improve this (recently they have been working hard at improving query execution plans), but I don't know if this is on the TODO task, or if it has a high priority.

SELECT with query variables not using INDEXes

Tags:

indexing

mysql

hierarchical-data

adjacency-list

query-variables

Kaii

1 Answers

Shlomi Noach

Recent Activity

Donate For Us

SELECT with query variables not using INDEXes

Tags:

indexing

mysql

hierarchical-data

adjacency-list

query-variables

Kaii

1 Answers

Shlomi Noach

Related questions

Recent Activity

Donate For Us