I am trying to get a count of number of duplicate values on two columns grouping on another column in SQL Server.
Below is a sample scenario I am working on.
DECLARE @mytable TABLE (CampName varchar(10),ID VARCHAR(10),ListName varchar(10))
INSERT INTO @mytable
( CampName, ID, ListName )
VALUES ( 'A', 'X', 'Y' ), ( 'A', 'X', 'Y' ),
( 'A', 'Y', 'Z' ), ( 'A', 'Y', 'Z' ),
( 'A', 'Y', 'Z' ), ( 'A', 'P', 'Q' ),
( 'B', 'X', 'Y' ), ( 'B', 'X', 'Y' ),
( 'B', 'Y', 'Z' ), ( 'B', 'Y', 'Z' ),
( 'B', 'Y', 'Z' ), ( 'B', 'P', 'Q' ),
( 'B', 'R', 'S' ), ( 'B', 'R', 'S' )
This would result in the following table.
CampName ID ListName
-------------------------------------
A X Y
A X Y -- Duplicate Record
A Y Z
A Y Z -- Duplicate Record
A Y Z -- Duplicate Record
A P Q
B X Y
B X Y -- Duplicate Record
B Y Z
B Y Z -- Duplicate Record
B Y Z -- Duplicate Record
B P Q
B R S
B R S -- Duplicate Record
I need the output as follows:
CampName dupcount
-------------------
A 3
B 4
Basically, I need to figure out the number of duplicate (ID,ListName) for each CampName irrespective of what the duplicate values are.
Let me know if I can clarify something else in this regard. Any help would be greatly appreciated.
Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.
Well, GROUP BY and DISTINCT have their own use. GROUP BY cannot replace DISTINCT in some situations and DISTINCT cannot take place of GROUP BY. It is as per your choice and situation how you are optimizing both of them and choosing where to use GROUP BY and DISTINCT.
To select distinct combinations from two columns, you can use CASE statement. Let us create a table with some columns.
but when we want to count distinct column combinations, we must either clumsily concatenate values (and be very careful to choose the right separator): select count(distinct col1 || '-' || col2) from mytable; or use a subquery: select count(*) from (select distinct col1, col2 from mytable);
You can use the following query:
SELECT CampName, SUM(cnt) AS dupcount
FROM (
SELECT CampName, COUNT(*) - 1 AS cnt
FROM @mytable
GROUP BY CampName, ID, ListName
HAVING COUNT(*) > 1) AS t
GROUP BY CampName
The inner query uses a HAVING
clause to filter out non-duplicate entries. It also calculates the number of duplicate records per ID, ListName
. The outer query simply sums the number of duplicates.
I believe that the distinct number of combinations of ID
and ListName
need to be subtracted from the total count for each CampName
group to get the correct result.
SELECT t.CampName,
COUNT(*) - COUNT(DISTINCT 'ColOne' + ID + 'ColTwo' + ListName) AS dupcount
FROM yourTable t
GROUP BY CampName
This query employs a trick, which is concatenating the ID
and ListName
columns, which are both text, to effectively form a pseudo-group. The need for this is that DISTINCT
only works on a single column, but you have two columns which need to be considered.
Reference: Quora: In SQL, how to I count DISTINCT over multiple columns?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With