Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I tell the MySQL Optimizer to use the index on a derived table?

Suppose you have a query like this...

SELECT T.TaskID, T.TaskName, TAU.AssignedUsers
FROM `tasks` T
    LEFT OUTER JOIN (
        SELECT TaskID, GROUP_CONCAT(U.FirstName, ' ',
            U.LastName SEPARATOR ', ') AS AssignedUsers
        FROM `tasks_assigned_users` TAU
            INNER JOIN `users` U ON (TAU.UserID=U.UserID)
        GROUP BY TaskID
    ) TAU ON (T.TaskID=TAU.TaskID)

Multiple people can be assigned to a given task. The purpose of this query is to show one row per task, but with the people assigned to the task in a single column

Now... suppose you have the proper indexes setup on tasks, users, and tasks_assigned_users. The MySQL Optimizer will still not use the TaskID index when joining tasks to the derived table. WTF?!?!?

So, my question is... how can you make this query use the index on tasks_assigned_users.TaskID? Temporary tables are lame, so if that's the only solution... the MySQL Optimizer is stupid.

Indexes used:

  • tasks
    • PRIMARY - TaskID
  • users
    • PRIMARY - UserID
  • tasks_assigned_users
    • PRIMARY - (TaskID,UserID)
    • Additional index UNIQUE - (UserID,TaskID)

EDIT: Also, this page says that derived tables are executed/materialized before joins occur. Why not re-use the keys to perform the join?

EDIT 2: MySQL Optimizer won't let you put index hints on derived tables (presumably because there are no indexes on derived tables)

EDIT 3: Here is a really nice blog post about this: http://venublog.com/2010/03/06/how-to-improve-subqueries-derived-tables-performance/ Notice that Case #2 is the solution I'm looking for, but it appears that MySQL does not support this at this time. :(

EDIT 4: Just found this: "As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):... During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Seems promising...

like image 761
BMiner Avatar asked Jan 18 '12 19:01

BMiner


2 Answers

There is a solution to this in MySQL Server 5.6 - the preview release (at the time of this writing).

http://dev.mysql.com/doc/refman/5.6/en/from-clause-subquery-optimization.html

Although, I'm not sure if the MySQL Optimizer will re-use indexes that already exist when it "adds indexes to the derived table"

Consider the following query:

SELECT * FROM t1 JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;

The documentation says: "The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan."

OK, that's great, but does the optimizer re-use indexes from t2? In other words, what if an index existed for t2.f1? Does this index get re-used, or does the optimizer recreate this index for the derived table? Who knows?

EDIT: The best solution until MySQL 5.6 is to create a temporary table, create an index on that table, and then run the SELECT query on the temp table.

like image 125
BMiner Avatar answered Oct 03 '22 03:10

BMiner


The problem I see is that by doing a subquery there is no underlying indexed table. If you are having a performance I'd do the grouping at the end, something like this:

SELECT T.TaskID, T.TaskName, GROUP_CONCAT(U.FirstName, ' ', U.LastName SEPARATOR ', ') AS AssignedUsers
FROM `tasks` T
    LEFT OUTER JOIN  `tasks_assigned_users` TAU ON (T.TaskID=TAU.TaskID)
    INNER JOIN `users` U ON (TAU.UserID=U.UserID)
GROUP BY T.TaskID, T.TaskName
like image 20
Luis Avatar answered Oct 03 '22 03:10

Luis