I'm trying to get a count of child records (addresses) for each customer. I have 2 queries and I'm wondering if they're the same:
SELECT
a.AddressId, c.CustomerID, COUNT(*) AS NumDuplicates
FROM
Customers C
INNER JOIN
Addresses a ON c.AddressID = a.AddressID
GROUP BY
c.CustomerID, a.AddressId
ORDER BY
NumDuplicates DESC
SELECT
c.CustomerID,
(SELECT COUNT(*)
FROM Addresses a
WHERE a.AddressID = c.AddressID) AS AddressCount
FROM
Customers c
ORDER BY
AddressCount desc
If they're not, what's the difference? If they are which is more efficient?
The two queries are different, because the first only returns customers that have at least one match in the address table. The second returns all customers, even those with no match and having AddressId
is NULL.
The equivalent first query is:
SELECT c.CustomerID, COUNT(a.AddressId) AS NumDuplicates
FROM Customers C LEFT JOIN
Addresses a
ON c.AddressID = a.AddressID
GROUP BY c.CustomerID
ORDER BY NumDuplicates DESC;
As for performance, you should try them out. There are reasons why either might be faster. The second avoids having to do aggregation, but does have a correlated subquery. However, SQL Server has some tricks for speeding joins and aggregation. I would guess that the correlated subquery version is faster, but I might be wrong for your data and server.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With