Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count distinct values across more than one column

Tags:

sql

sql-server

I need to count distinct values that have been spread across more than one column. I have a method but suspect there is a more elegant way.

Example input:

year project staff 1 staff 2 staff 3 cost
2022 A Ian 100
2023 A Jim Anne 200
2024 A Anne Jim Peter 300
2023 B Anne Sue 400
2024 B Dave 500

Expected output:

project number_distinct_staff total_cost
A 4 600
B 3 900

Current method:

SELECT project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(cost) AS total_cost
FROM (
    SELECT project, staff_1 AS staff, cost AS cost FROM table UNION ALL
    SELECT project, staff_2 AS staff, NULL AS cost FROM table UNION ALL
    SELECT project, staff_3 AS staff, NULL AS cost FROM table
) AS sub
GROUP BY project

This approach feels clumsy to me. I am worried that the asymmetric handling of columns (like cost in the example) makes the process vulnerable to double counting.

Is there a better way?

like image 596
Simon.S.A. Avatar asked Sep 16 '25 23:09

Simon.S.A.


2 Answers

Please try the following solution leveraging a Table Value Constructor.

For the refence: The Table Value Constructor (aka the VALUES keyword)

Basically, we are converting columns to rows via VALUES (...)

db-fiddle

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (project CHAR(1), staff_1 VARCHAR(20), staff_2 VARCHAR(20), staff_3 VARCHAR(20), cost INT);
INSERT @tbl (project, staff_1, staff_2, staff_3, cost) VALUES
('A', NULL,  'Ian',  NULL, 100),
('A', 'Jim', 'Anne', NULL, 200),
('A', 'Anne', 'Jim', 'Peter', 300),
('B', 'Anne', 'Sue', NULL, 400),
('B', NULL,   NULL, 'Dave', 500);
-- DDL and sample data population, end

SELECT t1.project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(t1.cost) AS total_cost
FROM @tbl AS t
CROSS APPLY (VALUES
    (project, staff_1, cost),
    (project, staff_2, NULL),
    (project, staff_3, NULL)) AS t1(project, staff, cost)
GROUP BY t1.project;

Output

project num_distinct_staff total_cost
A 4 600
B 3 900
like image 183
Yitzhak Khabinsky Avatar answered Sep 19 '25 16:09

Yitzhak Khabinsky


First off, your solution is correct you just had a little error,
see my dbFiddle

-- your solution after correction
SELECT project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(cost) AS total_cost
FROM (
    SELECT project, staff_1 as staff, cost FROM tbl UNION ALL
    SELECT project, staff_2, NULL AS staff FROM tbl UNION ALL
    SELECT project, staff_3, NULL AS staff FROM tbl
) AS sub
GROUP BY project

If you want to try something other than your solution or the other answer, another way is to use both string_agg and string_split like this

select t.project,
       count(distinct value) as number_distinct_staff,
       min(t.costs) as total_cost
from   ( select project,
                sum(cost) as costs,
                string_agg(staff_1, ',') + ',' + string_agg(staff_2, ',') + ',' + string_agg(staff_3, ',') as staffs
         from   tbl
         group by project
       )  t
  cross apply string_split(t.staffs, ',')
group by t.project

Here is the dbFiddle

And the result is

project number_distinct_staff total_cost
A 4 600
B 3 900
like image 44
GuidoG Avatar answered Sep 19 '25 16:09

GuidoG