Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this WHERE clause make my query 180 times slower?

Tags:

mysql

the following query executes in 1.6 seconds

SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;

#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)

SELECT * FROM (

#this query adds row numbers to the query within it

SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (

SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id 
    FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
  (
  SELECT fav3.product_id AS product_id, SUM(CASE 
    WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
    WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
    ELSE 0
    END) AS favorites_count
    FROM favorites fav3
GROUP BY fav3.product_id 

  ) AS fav4 ON p1.product_id=fav4.product_id
    INNER JOIN sex ON sex.product_id=p1.product_id AND
    sex.sex=0 AND
    sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY) 
    INNER JOIN shops ON shops.shop_id = p1.shop_id
    ORDER BY shop, sex.DATE, product_id
    ) AS testtable

) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)

adding AND shops.shop_id=86 to the final WHERE clause causes the query to execute in 292 seconds:

SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;

#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)

SELECT * FROM (

#this query adds row numbers to the query within it

SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (

SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id 
    FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
  (
  SELECT fav3.product_id AS product_id, SUM(CASE 
    WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
    WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
    ELSE 0
    END) AS favorites_count
    FROM favorites fav3
GROUP BY fav3.product_id 

  ) AS fav4 ON p1.product_id=fav4.product_id
    INNER JOIN sex ON sex.product_id=p1.product_id AND
    sex.sex=0 AND
    sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
    INNER JOIN shops ON shops.shop_id = p1.shop_id AND
    shops.shop_id=86
    ORDER BY shop, sex.DATE, product_id
    ) AS testtable

) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)

I would have thought limiting the shops table with AND shops.shop_id=86 would reduce execution time. Instead, execution time appears to depend upon the number of rows in the products table with products.shop_id equal to the specified shops.shop_id. There are about 34K rows in the products table with products.shop_id=86, and execution time is 292 seconds. For products.shop_id=50, there are about 28K rows, and execution time is 210 seconds. For products.shop_id=175, there are about 2K rows, and execution time is 2.8 seconds. What is going on?

EXPLAIN EXTENDED for the 1.6 second query is:

id  select_type table   type    possible_keys   key key_len ref rows    filtered    Extra
1   PRIMARY <derived2>  ALL NULL    NULL    NULL    NULL    1203    100.00  Using where
2   DERIVED <derived3>  ALL NULL    NULL    NULL    NULL    1203    100.00  
3   DERIVED sex ALL product_id_2,product_id NULL    NULL    NULL    526846  75.00   Using where; Using temporary; Using filesort
3   DERIVED p1  eq_ref  PRIMARY,shop_id,shop_id_2,product_id,shop_id_3  PRIMARY 4   mydatabase.sex.product_id   1   100.00  
3   DERIVED <derived4>  ALL NULL    NULL    NULL    NULL    14752   100.00  
3   DERIVED shops   eq_ref  PRIMARY PRIMARY 4   mydatabase.p1.shop_id   1   100.00  
4   DERIVED fav3    ALL NULL    NULL    NULL    NULL    15356   100.00  Using temporary; Using filesort

SHOW WARNINGS for this EXPLAIN EXTENDED is

-----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(@num:=if(((@current_shop_id) = `testtable`.`shop_id`),if(((@current_product_id) = `testtable`.`product_id`),(@num),((@num) + 1)),0)) AS `row_number`,(@current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(@current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select `mydatabase`.`shops`.`shop` AS `shop`,`mydatabase`.`shops`.`shop_id` AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`fav4`.`product_id` = `mydatabase`.`sex`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`mydatabase`.`shops`.`shop_id` = `mydatabase`.`p1`.`shop_id`) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by `mydatabase`.`shops`.`shop`,`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+------

EXPLAIN EXTENDED for the 292 second query is:

id  select_type table   type    possible_keys   key key_len ref rows    filtered    Extra
1   PRIMARY <derived2>  ALL NULL    NULL    NULL    NULL    36  100.00  Using where
2   DERIVED <derived3>  ALL NULL    NULL    NULL    NULL    36  100.00  
3   DERIVED shops   const   PRIMARY PRIMARY 4       1   100.00  Using temporary; Using filesort
3   DERIVED p1  ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3  shop_id 4       11799   100.00  
3   DERIVED <derived4>  ALL NULL    NULL    NULL    NULL    14752   100.00  
3   DERIVED sex eq_ref  product_id_2,product_id product_id_2    5   mydatabase.p1.product_id    1   100.00  Using where
4   DERIVED fav3    ALL NULL    NULL    NULL    NULL    15356   100.00  Using temporary; Using filesort

SHOW WARNINGS for this EXPLAIN EXTENDED is

----+ 
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(@num:=if(((@current_shop_id) = `testtable`.`shop_id`),if(((@current_product_id) = `testtable`.`product_id`),(@num),((@num) + 1)),0)) AS `row_number`,(@current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(@current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select 'shop.nordstrom.com' AS `shop`,'86' AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`fav4`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`p1`.`shop_id` = 86) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by 'shop.nordstrom.com',`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) | 
+-----

I am running MySQL client version: 5.1.56. The shops table has a primary index on shop_id:

Action  Keyname Type    Unique  Packed  Column  Cardinality Collation   Null    Comment
 Edit    Drop   PRIMARY BTREE   Yes No  shop_id 163 A

I have analyzed the shop table but this did not help.

I notice that if I remove the LEFT JOIN the difference in execution times drops to 0.12 seconds versus 0.28 seconds.

Cez's solution, namely to use the 1.6-second version of the query and remove irrelevant results by adding rowed_results.shop_dummy=86 to the outer query (as below), executes in 1.7 seconds. This circumvents the problem, but the mystery remains why 292-second query is so slow.

SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;

#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)

SELECT * FROM (

#this query adds row numbers to the query within it

SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (

SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id 
    FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
  (
  SELECT fav3.product_id AS product_id, SUM(CASE 
    WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
    WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
    ELSE 0
    END) AS favorites_count
    FROM favorites fav3
GROUP BY fav3.product_id 

  ) AS fav4 ON p1.product_id=fav4.product_id
    INNER JOIN sex ON sex.product_id=p1.product_id AND sex.sex=0
    INNER JOIN shops ON shops.shop_id = p1.shop_id
    WHERE sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY) 


    ORDER BY shop, sex.DATE, product_id
    ) AS testtable

) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7) AND
rowed_results.shop_dummy=86;
like image 951
jela Avatar asked Nov 15 '12 20:11

jela


People also ask

Does where clause slow down query?

Although the where clause has a huge impact on performance, it is often phrased carelessly so that the database has to scan a large part of the index. The result: a poorly written where clause is the first ingredient of a slow query.

What does the where clause do to a query?

In a SQL statement, the WHERE clause specifies criteria that field values must meet for the records that contain the values to be included in the query results.

Why is offset pagination slow?

Why OFFSET is so slow? Well, in most cases, low offset queries are not slow. The problem starts with high OFFSET values. If your query is using the following limit clause: "LIMIT 50000, 20", it's actually requesting the database to go through 50,020 rows and throw away the first 50,000.


2 Answers

After the chat room, and actually creating tables/columns to match the query, I've come up with the following query.

I have started my inner-most query to be on the sex, product (for shop_id) and favorites table. Since you described that ProductX at ShopA = Product ID = 1 but same ProductX at ShopB = Product ID = 2 (example only), each product is ALWAYS unique per shop and never duplicated. That said, I can get the product and shop_id WITH the count of favorites (if any) at this query, yet group on just the product_id .. as shop_id won't change per product I am using MAX(). Since you are always looking by a date of "yesterday" and gender (sex=0 female), I would have the SEX table indexed on ( date, sex, product_id )... I would guess you are not adding 1000's of items every day... Products obviously would have an index on product_id (primary key), and favorites SHOULD have an index on product_id.

From that result (alias "sxFav") we can then do a direct join to the sex and products table by that "Product_ID" to get any additional information you may want, such as name of shop, date product added, product description, etc. This result is then ordered by the shop_id the product is being sold from, date and finally product ID (but you may consider grabbing a description column at inner query and using that as sort-by). This results in alias "PreQuery".

With the order being all proper by shop, we can now add the @MySQLVariable references to get each product assigned a row number similar to how you originally attempted. However, only reset back to 1 when a shop ID changes.

SELECT 
      PreQuery.*,
      @num := IF( @current_shop_id = PreQuery.shop_id, @num +1, 1 ) AS RowPerShop, 
      @current_shop_id := PreQuery.shop_id AS shop_dummy 
   from 
      ( SELECT 
              sxFav.product_id, 
              sxFav.shop_id, 
              sxFav.Favorites_Count
           from 
              ( SELECT 
                      sex.product_id,
                      MAX( p.shop_id ) shop_id,
                      SUM( CASE WHEN F.current = 1 AND F.closeted = 1 THEN 1 
                                WHEN F.current = 1 AND F.closeted = 0 THEN -1 
                                ELSE 0 END ) AS favorites_count 
                   from 
                      sex
                         JOIN products p
                            ON sex.Product_ID = p.Product_ID
                         LEFT JOIN Favorites F 
                            ON sex.product_id = F.product_ID 
                   where 
                          sex.date >= subdate( now(), interval 1 day) 
                      and sex.sex = 0 
                   group by 
                      sex.product_id ) sxFav 

              JOIN sex 
                 ON sxFav.Product_ID = sex.Product_ID

              JOIN products p
                 ON sxFav.Product_ID = p.Product_ID
      order by 
         sxFav.shop_id, 
         sex.date, 
         sxFav.product_id ) PreQuery,

     ( select @num :=0, 
              @current_shop_id := 0 ) as SQLVars 

Now, if you are looking for specific "paging" information (such as 7 entries per shop), wrap the ENTIRE query above into something like...

select * from ( entire query above ) where RowPerShop between 1 and 7

(or between 8 and 14, 15 and 21, etc as needed) or even

RowPerShop between RowsPerPage*PageYouAreShowing and RowsPerPage*(PageYouAreShowing +1)
like image 108
DRapp Avatar answered Dec 01 '22 00:12

DRapp


You should move the shops.shop_id=86 to the JOIN condition for shops. No reason to put it outside the JOIN, you run the risk of MySQL JOINing first, then filtering. A JOIN can do the same job the a WHERE clause does, especially if you are not referencing other tables.

....
INNER JOIN shops ON shops.shop_id = p1.shop_id AND shops.shop_id=86
....

Same thing with the sex join:

...
INNER JOIN shops ON shops.shop_id = p1.shop_id
AND sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
...

Derived tables are great, but they have no indexes on them. Usually this doesn't matter since they are generally in RAM. But between filtering and sorting with no indexes, things can add up.

Note that in the second query that take much longer, the table processing order changes. The shop table is at the top in the slow query and the p1 table retrieves 11799 rows instead of 1 row in the fast query. It also doesn't use the primary key any more. That's likely where your problem is.

3   DERIVED p1  eq_ref  PRIMARY,shop_id,shop_id_2,product_id,shop_id_3  PRIMARY 4   mydatabase.sex.product_id   1   100.00  

3   DERIVED p1  ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3  shop_id 4       11799   100.00  
like image 32
Brent Baisley Avatar answered Dec 01 '22 00:12

Brent Baisley