Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL index optimization with Subquery vs Left Joins

I have created 2 queries that I can use that do the same function. They both contain properties that I would like to merge into a single query but I have been unable to.

QUERY 1 - Gives me exactly the results I want. Slow (~0.700 sec)

QUERY 2 - Gives me a lot of rows that I ignore and skip over. Fast (~0.005 sec)

My goal is to modify QUERY 2 to drop all null price rows except 1 for each item. I can't seem to do this with out taking a ding on performance. This is due to my lack of experience and understanding of index use in MySQL.

QUERY 1

Uses a poorly designed subquery which does not allow the use of indexing across tbl_sale (e) which contains 10k rows.

SELECT b.id, b.sv, b.description, der.store_id, f.name, der.price
FROM tbl_watch AS a
    LEFT JOIN tbl_item AS b ON a.item_id = b.id
LEFT JOIN (
    SELECT c.store_id, d.flyer_id, e.item_id, e.price
    FROM tbl_storewatch AS c, tbl_storeflyer AS d
    FORCE INDEX ( storebeg_ndx ) , tbl_sale AS e
    WHERE c.user_id = '$user_id'
    AND (
        d.store_id = c.store_id
        AND d.date_beg = '20121206'
        )
    AND e.flyer_id = d.flyer_id
        ) AS der ON a.item_id = der.item_id
LEFT JOIN tbl_store as f ON der.store_id = f.id
WHERE a.user_id = '$user_id'
ORDER BY b.description ASC

Here is the EXPLAIN for QUERY 1

id  select_type table       type    possible_keys   key             key_len     ref     rows    Extra
1   PRIMARY     a           ref     user_item_ndx   user_item_ndx   4           const   30  Using index; Using temporary; Using filesort
1   PRIMARY     b           eq_ref  PRIMARY         PRIMARY         4           a.item_id   1   
1   PRIMARY     <derived2>  ALL     NULL            NULL            NULL        NULL    300     
1   PRIMARY     f           eq_ref  PRIMARY         PRIMARY         4           der.store_id    1   
2   DERIVED     c           ref     user_ndx        user_ndx        4                   6   
2   DERIVED     e           ALL     NULL            NULL    NULL    NULL                9473    Using join buffer
2   DERIVED     d           eq_ref  storebeg_ndx    storebeg_ndx    8           c.store_id  1   Using where

QUERY 2

Uses all left joins which is very efficient (with the exception of the ORDER BY). Indexes are used on every join. This query returns all possible matches for every item in tbl_watch. Here is the query:

SELECT b.id, b.sv, b.description, c.store_id, f.name, e.price
FROM tbl_watch AS a
LEFT JOIN tbl_item AS b ON a.item_id = b.id
LEFT JOIN tbl_storewatch AS c ON c.user_id = '$user_id'
LEFT JOIN tbl_storeflyer AS d ON d.store_id = c.store_id
    AND d.date_beg = '$s_date'
LEFT JOIN tbl_sale AS e ON e.item_id = a.item_id
    AND e.flyer_id = d.flyer_id 
LEFT JOIN tbl_store as f ON d.store_id = f.id
WHERE a.user_id = '$user_id'
ORDER BY b.description ASC

Here is the EXPLAIN for the query:

id  select_type     table   type    possible_keys           key             key_len     ref                     rows    Extra
1   SIMPLE          a       ref     user_item_ndx           user_item_ndx   4           const                   6       Using index; Using temporary; Using filesort
1   SIMPLE          b       eq_ref  PRIMARY                 PRIMARY         4           a.item_id               1   
1   SIMPLE          c       ref     user_ndx                user_ndx        4           const                   2   
1   SIMPLE          d       eq_ref  storebeg_ndx,storendx   storebeg_ndx    8           c.store_id,const        1   
1   SIMPLE          e       eq_ref  itemflyer_ndx           itemflyer_ndx   8           a.item_id,d.flyer_id    1   
1   SIMPLE          f       eq_ref  PRIMARY                 PRIMARY         4           d.store_id              1   

How can I modify QUERY 2 (more efficient) to give me just the rows I need like in QUERY 1 to work with?

Thanks Mike

like image 467
ridgeback Avatar asked Dec 08 '12 03:12

ridgeback


People also ask

Is subquery faster than LEFT join?

Advantages Of Joins:The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

IS LEFT join better than subquery?

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone. So subqueries can be slower than LEFT [OUTER] JOIN , but in my opinion their strength is slightly higher readability.

Why use subqueries instead of joins?

If you need to combine related information from different rows within a table, then you can join the table with itself. Use subqueries when the result that you want requires more than one query and each subquery provides a subset of the table involved in the query.

What is faster a correlated subquery or an inner join?

"Correlated subqueries" are faster than Normal joins.


1 Answers

I think this query will give you what you want:

select a.id, a.sv, a.description, c.id, c.name, b.price
  from 
    tbl_item a left outer join tbl_sale b on (a.id=b.item_id)
      left outer join tbl_storeflyer d on (b.flyer_id=d.flyer_id and d.date_beg = '20120801')
      left outer join tbl_store c on (d.store_id = c.id)
      left outer join tbl_storewatch x on (c.id = x.store_id)
      left outer join tbl_watch y on (a.id = y.item_id);

with NULLs involved, you're likely going to have some left joins. The alternate way is to use a union, which with MySQL may be faster:

 select a.id, a.sv, a.description, c.id as store_id, c.name, b.price
  from
    tbl_item a,
    tbl_sale b,
    tbl_storeflyer d,
    tbl_store c,
    tbl_storewatch x,
    tbl_watch y
  where
    a.id = b.item_id and
    b.flyer_id = d.flyer_id and
    d.store_id = c.id and
    c.id = x.store_id and
    a.id = y.item_id and
    d.date_beg = '20120801'
union
 select a.id, a.sv, a.description, null as store_id, null as name, null as price
  from
    tbl_item a
  where
    a.id not in (select b.item_id from tbl_sale b);

you might play with the second half of the union being a left outer join instead of a 'not in' subquery - depends on how your version of MySQL optimizes.

like image 66
PlexQ Avatar answered Nov 14 '22 23:11

PlexQ