Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining table twice makes the query slow

Tags:

join

mysql

My problem is that my query is very slow when use JOIN on the same table twice.

I want to retrieve all the products from a given category. But since the product can be in multiple categories I also want to get the (c.canonical) category that should provide the URL base. Therefore I have 2 extra JOIN on categories AS c and categories_products AS cp2.

Original query

SELECT p.product_id
FROM products AS p
JOIN categories_products AS cp
    ON p.product_id = cp.product_id
JOIN product_variants AS pv
    ON pv.product_id = p.product_id
WHERE cp.category_id = 2
    AND p.status = 2
GROUP BY p.product_id
ORDER BY cp.product_sortorder ASC
LIMIT 0, 40

EXPLAIN

| id | select_type | table |   type |          possible_keys |                    key | key_len |                     ref | rows |                                        extra |
|----|-------------|-------|--------|------------------------|------------------------|---------|-------------------------|------|----------------------------------------------|
|  1 |      SIMPLE |    cp |    ref | FK_categories_products | FK_categories_products |       4 |                   const | 1074 | Using where; Using temporary; Using filesort |
|  1 |      SIMPLE |     p | eq_ref |                PRIMARY |                PRIMARY |       4 | superlove.cp.product_id |    1 |                                    Using where |
|  1 |      SIMPLE |    pv |    ref |    FK_product_variants |    FK_product_variants |       4 |  superlove.p.product_id |    1 |                                    Using where |    

Slow query

SELECT p.product_id, c.category_id
FROM products AS p
JOIN categories_products AS cp
    ON p.product_id = cp.product_id
JOIN categories_products AS cp2        // Extra line
    ON p.product_id = cp2.product_id   // Extra line
JOIN categories AS c                   // Extra line
    ON cp2.category_id = c.category_id // Extra line
JOIN product_variants AS pv
    ON pv.product_id = p.product_id
WHERE cp.category_id = 2
    AND p.status = 2
    AND c.canonical = 1                // Extra line
GROUP BY p.product_id
ORDER BY cp.product_sortorder ASC
LIMIT 0, 40

EXPLAIN

| id | select_type | table |   type |          possible_keys |                    key | key_len |                      ref | rows |                                        extra |
|----|-------------|-------|--------|------------------------|------------------------|---------|--------------------------|------|----------------------------------------------|
|  1 |      SIMPLE |     c |    ALL |                PRIMARY |                 (null) |  (null) |                   (null) |  221 | Using where; Using temporary; Using filesort |
|  1 |      SIMPLE |   cp2 |    ref | FK_categories_products | FK_categories_products |       4 |  superlove.c.category_id |   33 |                                              |
|  1 |      SIMPLE |     p | eq_ref |                PRIMARY |                PRIMARY |       4 | superlove.cp2.product_id |    1 |                                  Using where |
|  1 |      SIMPLE |    pv |    ref |    FK_product_variants |    FK_product_variants |       4 |   superlove.p.product_id |    1 |                                  Using where |
|  1 |      SIMPLE |    cp |    ref | FK_categories_products | FK_categories_products |       4 |                    const | 1074 |                                  Using where |
like image 261
Cudos Avatar asked Dec 11 '25 17:12

Cudos


1 Answers

The MySQL optimizer seems to have trouble with this query. I get the impression that only rather few products would be in the requested category, but there would likely be many canonical categories. However, the optimizer apparently cannot tell that cp.category_id = 2 is a stronger condition than c.canonical = 1, so it starts the new query with c instead of cp, leading to a lot of superfluous rows along the way.

Providing data to the optimizer

Your first attempt should be trying to provide the optimizer with the required data: using the ANALYZE TABLE command, you can collect information about key distribution. For this to work, you'd have to have suitable keys in place. So perhaps you should add a key on categories.canonical. Then MySQL would know that there are (if I understand you correctly) only two distinct values for that column, and perhaps even how many rows in each. With a bit of luck, that would tell it that using c.canonical = 1 as the starting point would be a poor choice.

Forcing join order

If that does not help, then I suggest you force the order using STRAIGHT_JOIN. In particular, you might want to force cp as the first table, just as your original (and fast) query had it. If that solves the problem, you can stick to that solution. If not, then you should provide a new EXPLAIN output, so we can see where that approach fails.

Schema considerations

One more thing to consider: your question implies that for every product, there is exactly one canonical category associated with it. But your database schema does not reflect that fact. You might want to consider ways to modify your schema to reflect that fact. For example, you could have a column called canonical_category_id in products table, and use categories_products for non-canonical categories only. If you use such a setup, you might want to create a VIEW which joins products to all their categories, both canonical and non-canonical ones, using a UNION like this:

CREATE VIEW products_all_categories AS
SELECT product_id, canonical_category_id AS category_id
FROM products
UNION ALL
SELECT product_id, category_id
FROM categories_products

You could use this instead of categories_products in those places where you don't care whether a category is canonical or not. You could even rename the table and name the view categories_products instead, so that your existing queries work as they used to. You should add an index on the two columns from products used in this query. Perhaps even two indices, one for either order of these columns.

Not sure whether this whole setup would be acceptable in your application. Not sure whether it would really bring the intended speed gain. In the end, you might be forced to maintain redundant data, like a products.canonical column in addition to a reference to the canonical category in the categories_products table. I know redundant data is ugly from a design point of view, but for the sake of performance it might be necessary in order to avoid long computations. At least on a RDBMS which doesn't support materialized views. You could probably use triggers to keep data consistent, though I have no actual experience there.

like image 142
MvG Avatar answered Dec 13 '25 07:12

MvG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!