Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server - getting duplicate rows with different queries

Tags:

sql

sql-server

I'm trying to get a count of child records (addresses) for each customer. I have 2 queries and I'm wondering if they're the same:

SELECT 
    a.AddressId,  c.CustomerID, COUNT(*) AS NumDuplicates 
FROM 
    Customers C
INNER JOIN 
    Addresses a ON c.AddressID = a.AddressID 
GROUP BY 
    c.CustomerID, a.AddressId 
ORDER BY 
    NumDuplicates DESC


SELECT 
    c.CustomerID,
    (SELECT COUNT(*) 
     FROM Addresses a 
     WHERE a.AddressID = c.AddressID) AS AddressCount
FROM 
    Customers c 
ORDER BY 
    AddressCount desc

If they're not, what's the difference? If they are which is more efficient?

like image 865
Eitan Avatar asked Mar 13 '23 17:03

Eitan


1 Answers

The two queries are different, because the first only returns customers that have at least one match in the address table. The second returns all customers, even those with no match and having AddressId is NULL.

The equivalent first query is:

SELECT c.CustomerID, COUNT(a.AddressId) AS NumDuplicates
FROM Customers C LEFT JOIN
     Addresses a
     ON c.AddressID = a.AddressID 
GROUP BY c.CustomerID
ORDER BY NumDuplicates DESC;

As for performance, you should try them out. There are reasons why either might be faster. The second avoids having to do aggregation, but does have a correlated subquery. However, SQL Server has some tricks for speeding joins and aggregation. I would guess that the correlated subquery version is faster, but I might be wrong for your data and server.

like image 94
Gordon Linoff Avatar answered Mar 16 '23 06:03

Gordon Linoff