So I have this query written by someone else that I'm trying to refactor, which pulls some features/materials for an item(shoes, generally).
There are a lot of products, and thus a whole lot of joining table-entries, but only a few few features that are available for them. I'm thinking that there has to be a way to cut down the need to touch upon the "big" list of items, to get these features, and I have heard that distinct is to be avoided, but I don't have a statement that can replace the "distinct" options here.
According to my logs, I'm getting slow result times:
Query_time: 7 Lock_time: 0 Rows_sent: 32 Rows_examined: 5362862
Query_time: 8 Lock_time: 0 Rows_sent: 22 Rows_examined: 6581994
As the message says, sometimes it is taking 7 or 8 seconds and sometimes or every time it is querying over 5 million rows.
That may be due to other load occurring at the same time, because here are the selects run on the database directly from the mysql command line:
mysql> SELECT DISTINCT features.FeatureId, features.Name
FROM features, itemsfeatures, items
WHERE items.FlagStatus != 'U'
AND items.TypeId = '13'
AND features.Type = 'Material'
AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name |
+-----------+--------------------+
| 40 | Alligator |
| 41 | Burnished Calfskin |
| 42 | Calfskin |
| 59 | Canvas |
| 43 | Chromexcel |
| 44 | Cordovan |
| 57 | Cotton |
| 45 | Crocodile |
| 58 | Deerskin |
| 61 | Eel |
| 46 | Italian Leather |
| 47 | Lizard |
| 48 | Nappa |
| 49 | NuBuck |
| 50 | Ostrich |
| 51 | Patent Leather |
| 60 | Rubber |
| 52 | Sharkskin |
| 53 | Silk |
| 54 | Suede |
| 56 | Veal |
| 55 | Woven |
+-----------+--------------------+
22 rows in set (0.00 sec)
mysql> select count(*) from features;
+----------+
| count(*) |
+----------+
| 122 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from itemsfeatures;
+----------+
| count(*) |
+----------+
| 38569 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from items;
+----------+
| count(*) |
+----------+
| 8656 |
+----------+
1 row in set (0.00 sec)
explain SELECT DISTINCT features.FeatureId, features.Name FROM features, itemsfeatures, items WHERE items.FlagStatus != 'U' AND items.TypeId = '13' AND features.Type = 'Material' AND features.FeatureId = itemsfeatures.FeatureId ORDER BY features.Name;
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | features | ref | PRIMARY,Type | Type | 33 | const | 21 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | itemsfeatures | ref | FeatureId | FeatureId | 4 | sherman_live.features.FeatureId | 324 | Using index; Distinct |
| 1 | SIMPLE | items | ALL | TypeId,FlagStatus | NULL | NULL | NULL | 8656 | Using where; Distinct; Using join buffer |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
3 rows in set (0.04 sec)
Edit:
Here is sample results without the distinct, (but with a limit, since otherwise it just hangs) for comparison:
SELECT features.FeatureId, features.Name FROM features, itemsfeatures, items WHERE items.FlagStatus != 'U' AND items.TypeId = '13' AND features.Type = 'Material' AND features.FeatureId = itemsfeatures.FeatureId ORDER BY features.Name limit 10;
+-----------+-----------+
| FeatureId | Name |
+-----------+-----------+
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
| 40 | Alligator |
+-----------+-----------+
10 rows in set (23.30 sec)
here's using a group by instead of a select distinct:
SELECT features.FeatureId, features.Name FROM features, itemsfeatures, items WHERE items.FlagStatus != 'U' AND items.TypeId = '13' AND features.Type = 'Material' AND features.FeatureId = itemsfeatures.FeatureId group by features.name ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name |
+-----------+--------------------+
| 40 | Alligator |
| 41 | Burnished Calfskin |
| 42 | Calfskin |
| 59 | Canvas |
| 43 | Chromexcel |
| 44 | Cordovan |
| 57 | Cotton |
| 45 | Crocodile |
| 58 | Deerskin |
| 61 | Eel |
| 46 | Italian Leather |
| 47 | Lizard |
| 48 | Nappa |
| 49 | NuBuck |
| 50 | Ostrich |
| 51 | Patent Leather |
| 60 | Rubber |
| 52 | Sharkskin |
| 53 | Silk |
| 54 | Suede |
| 56 | Veal |
| 55 | Woven |
+-----------+--------------------+
22 rows in set (13.28 sec)
...Because I'm trying to understand this general problem, how to replace bad select distinct queries in general, in addition to the slowness that this query specifically tends to cause.
I'm wondering whether the replacement for a select distinct is generally a group by (although in this case that isn't a comprehensive solution since it's still slow)?
Looks like you're missing a JOIN condition linking itemsfeatures
to items
. It's more obvious if you write the query using explicit JOIN operations.
SELECT DISTINCT f.FeatureId, f.Name
FROM features f
INNER JOIN itemsfeatures ifx
ON f.FeatureID = ifx.FeatureID
INNER JOIN items i
ON ifx.ItemID = i.ItemID /* This is the part you're missing */
WHERE i.FlagStatus != 'U'
AND i.TypeId = '13'
AND f.Type = 'Material'
ORDER BY f.Name;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With