SQL Server - getting duplicate rows with different queries

Question

I'm trying to get a count of child records (addresses) for each customer. I have 2 queries and I'm wondering if they're the same:

SELECT 
    a.AddressId,  c.CustomerID, COUNT(*) AS NumDuplicates 
FROM 
    Customers C
INNER JOIN 
    Addresses a ON c.AddressID = a.AddressID 
GROUP BY 
    c.CustomerID, a.AddressId 
ORDER BY 
    NumDuplicates DESC


SELECT 
    c.CustomerID,
    (SELECT COUNT(*) 
     FROM Addresses a 
     WHERE a.AddressID = c.AddressID) AS AddressCount
FROM 
    Customers c 
ORDER BY 
    AddressCount desc

If they're not, what's the difference? If they are which is more efficient?

Gordon Linoff · Accepted Answer

The two queries are different, because the first only returns customers that have at least one match in the address table. The second returns all customers, even those with no match and having AddressId is NULL.

The equivalent first query is:

SELECT c.CustomerID, COUNT(a.AddressId) AS NumDuplicates
FROM Customers C LEFT JOIN
     Addresses a
     ON c.AddressID = a.AddressID 
GROUP BY c.CustomerID
ORDER BY NumDuplicates DESC;

As for performance, you should try them out. There are reasons why either might be faster. The second avoids having to do aggregation, but does have a correlated subquery. However, SQL Server has some tricks for speeding joins and aggregation. I would guess that the correlated subquery version is faster, but I might be wrong for your data and server.

SQL Server - getting duplicate rows with different queries

Tags:

sql

sql-server

Eitan

1 Answers

Gordon Linoff

Recent Activity

Donate For Us

SQL Server - getting duplicate rows with different queries

Tags:

sql

sql-server

Eitan

1 Answers

Gordon Linoff

Related questions

Recent Activity

Donate For Us