I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING
applied after WHERE
).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY
. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*)
is usually more efficient than COUNT(notNullableField)
in MySQL. Until that is fixed in future MySQL versions, use COUNT(*)
where applicable.Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
NOT IN
) sub-results before applying GROUP BY
.Using LEFT JOIN / NULL
:
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS
:
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With