I'm still learning about MySQL. I may be making a very basic error, and I'm prepared to be chastened here...
What this query is trying to do is select the top members from our website based on a count of the number of book and recipe reviews they have made.
I'm making a calculation of the total in the SQL query itself. The query is slow (9 seconds) and will definitely not scale considering we only have 400 members and a few thousand reviews so far and it's growing quite quickly.
I presume it's doing a full table scan here, and that the calculation is slowing it down, but I don't know of an alternative way to do this and would love some wisdom.
Here's the SQL statement:
SELECT users.*, COUNT( DISTINCT bookshelf.ID ) AS titles, COUNT( DISTINCT book_reviews.ID ) as bookreviews, COUNT( DISTINCT recipe_reviews.ID ) AS numreviews, COUNT( DISTINCT book_reviews.ID ) + COUNT( DISTINCT recipe_reviews.ID ) as reviewtotal
FROM users
LEFT OUTER JOIN recipe_reviews ON recipe_reviews.user_id = users.ID
LEFT OUTER JOIN book_reviews ON book_reviews.user_id = users.ID
LEFT OUTER JOIN bookshelf ON users.ID = bookshelf.user_id
GROUP BY users.ID
ORDER BY reviewtotal DESC
LIMIT 8
Here is the EXPLANATION:
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
| 1 | SIMPLE | users | index | NULL | PRIMARY | 4 | NULL | 414 | Using temporary; Using filesort |
| 1 | SIMPLE | recipe_reviews | ref | recipe_reviews_fk | recipe_reviews_fk | 5 | users.ID | 12 | |
| 1 | SIMPLE | book_reviews | ref | user_id | user_id | 5 | users.ID | 4 | |
| 1 | SIMPLE | bookshelf | ref | recipe_reviews_fk | recipe_reviews_fk | 5 | users.ID | 13 | |
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
UPDATE & SOLVED:
I realized, and @recursive confirmed, that the query is the root of the problem. I'm getting Cartesian products from this. I rewrote it as a series of subqueries and the final working code is here:
SELECT *, bookreviews + recipereviews AS totalreviews
FROM (SELECT users.*,
(SELECT count(*) FROM bookshelf WHERE bookshelf.user_id = users.ID) as titles,
(SELECT count(*) FROM book_reviews WHERE book_reviews.user_id = users.ID) as bookreviews,
(SELECT count(*) FROM recipe_reviews WHERE recipe_reviews.user_id = users.ID) as recipereviews
FROM users) q
This gives me a result in milliseconds. There are also ways to do this with JOINs. See How to add together the results of several subqueries? if you want to follow this up.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
Don't let the asterisk (*) make you think it has the same use as in SELECT * statement. No, COUNT(*) will not go through the whole table before returning the number of rows, making itself slower than COUNT(1) .
The MyISAM engine maintains a count of all rows in a table, making counts in MySQL/MyISAM spectacularly fast. However, if you've done counts with InnoDB, another popular MySQL storage engine, or with a PostgreSQL table, then you know a count query takes much longer.
for features like that, it is always helpful to work with some kind of caching...
It might already help to create sums for all users on a nightly basis and store those sums with the user. This will help a lot and speed up your search.
You should also cache this request somehow for at least a minute or five since you will execute the same request independently on whos logged in.
You might try seeing if there is an improvement from removing the DISTINCT
modifiers. Assuming the DISTINCT
ed fields are primary keys anyway, this could be causing unnecessary work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With