Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct of two columns grouping on another column

I am trying to get a count of number of duplicate values on two columns grouping on another column in SQL Server.

Below is a sample scenario I am working on.

    DECLARE @mytable TABLE (CampName varchar(10),ID VARCHAR(10),ListName varchar(10))
    INSERT INTO @mytable
            ( CampName, ID, ListName )
    VALUES  ( 'A',   'X',   'Y' ), ( 'A',   'X',   'Y' ), 
            ( 'A',   'Y',   'Z' ), ( 'A',   'Y',   'Z' ),
            ( 'A',   'Y',   'Z' ), ( 'A',   'P',   'Q' ),
            ( 'B',   'X',   'Y' ), ( 'B',   'X',   'Y' ), 
            ( 'B',   'Y',   'Z' ), ( 'B',   'Y',   'Z' ),
            ( 'B',   'Y',   'Z' ), ( 'B',   'P',   'Q' ),
            ( 'B',   'R',   'S' ), ( 'B',   'R',   'S' )

This would result in the following table.

 CampName   ID  ListName
-------------------------------------
      A     X     Y
      A     X     Y -- Duplicate Record
      A     Y     Z
      A     Y     Z -- Duplicate Record
      A     Y     Z -- Duplicate Record
      A     P     Q
      B     X     Y 
      B     X     Y -- Duplicate Record
      B     Y     Z
      B     Y     Z -- Duplicate Record
      B     Y     Z -- Duplicate Record
      B     P     Q
      B     R     S
      B     R     S -- Duplicate Record

I need the output as follows:

CampName   dupcount
-------------------
A            3
B            4

Basically, I need to figure out the number of duplicate (ID,ListName) for each CampName irrespective of what the duplicate values are.

Let me know if I can clarify something else in this regard. Any help would be greatly appreciated.

like image 896
Kashyap MNVL Avatar asked Sep 08 '16 14:09

Kashyap MNVL


People also ask

Does distinct work on multiple columns?

Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.

Does distinct work with GROUP BY?

Well, GROUP BY and DISTINCT have their own use. GROUP BY cannot replace DISTINCT in some situations and DISTINCT cannot take place of GROUP BY. It is as per your choice and situation how you are optimizing both of them and choosing where to use GROUP BY and DISTINCT.

How do I get unique two column combinations in SQL?

To select distinct combinations from two columns, you can use CASE statement. Let us create a table with some columns.

How do I count distinct values of multiple columns in SQL?

but when we want to count distinct column combinations, we must either clumsily concatenate values (and be very careful to choose the right separator): select count(distinct col1 || '-' || col2) from mytable; or use a subquery: select count(*) from (select distinct col1, col2 from mytable);


2 Answers

You can use the following query:

SELECT CampName, SUM(cnt) AS dupcount
FROM (
  SELECT CampName, COUNT(*) - 1 AS cnt
  FROM @mytable
  GROUP BY CampName, ID, ListName
  HAVING COUNT(*) > 1) AS t
GROUP BY CampName

The inner query uses a HAVING clause to filter out non-duplicate entries. It also calculates the number of duplicate records per ID, ListName. The outer query simply sums the number of duplicates.

like image 171
Giorgos Betsos Avatar answered Oct 20 '22 18:10

Giorgos Betsos


I believe that the distinct number of combinations of ID and ListName need to be subtracted from the total count for each CampName group to get the correct result.

SELECT t.CampName,
       COUNT(*) - COUNT(DISTINCT 'ColOne' + ID + 'ColTwo' + ListName) AS dupcount
FROM yourTable t
GROUP BY CampName

This query employs a trick, which is concatenating the ID and ListName columns, which are both text, to effectively form a pseudo-group. The need for this is that DISTINCT only works on a single column, but you have two columns which need to be considered.

Reference: Quora: In SQL, how to I count DISTINCT over multiple columns?

like image 39
Tim Biegeleisen Avatar answered Oct 20 '22 19:10

Tim Biegeleisen