Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize values within a group using SQL

Tags:

sql

postgresql

I will need your help with this one. I want to normalize numbers within groups. Say I have this dataset:

A  B  C
-------
0  1  3
1  1  0
1  1  2
1  1  4
1  2  1
1  2  1

I want to group these rows by A and B, then normalize the values of C within its group, i.e. summing up all C's of the group and then dividing each C by that sum. In the above example, I would expect this result:

A  B   C
---------
0  1   1            // sum of C's in group is 1, 1/1=1
1  1   0            // sum of C's in group is 6, 0/6=0
1  1  1/3           // ...
1  1  2/3           // ...
1  2  1/2           // sum of C's in group is 2, 1/2=1/2
1  2  1/2           // ...

Division by zero can be handled separately. How to do this using SQL (or PSQL, if that helps)? I can think of ways to do this in principle, but I always end up with deeply nested SELECTs, which I want to avoid.

Thanks in advance!

like image 292
random6174 Avatar asked Dec 29 '15 14:12

random6174


2 Answers

You could use windowed functions for it:

SELECT a,b,
    1.0 * c / CASE WHEN SUM(c) OVER(PARTITION BY a,b) = 0 THEN 1
                   ELSE SUM(c) OVER(PARTITION BY a,b) END AS c
FROM tab

SqlFiddleDemo

Output:

╔════╦════╦════════════════════╗
║ a  ║ b  ║         c          ║
╠════╬════╬════════════════════╣
║ 0  ║ 1  ║ 1                  ║
║ 1  ║ 1  ║ 0                  ║
║ 1  ║ 1  ║ 0.3333333333333333 ║
║ 1  ║ 1  ║ 0.6666666666666666 ║
║ 1  ║ 2  ║ 0.5                ║
║ 1  ║ 2  ║ 0.5                ║
║ 2  ║ 2  ║ 0                  ║           -- added for check division by 0
╚════╩════╩════════════════════╝
like image 187
Lukasz Szozda Avatar answered Oct 20 '22 14:10

Lukasz Szozda


You can use a derived table to aggregate the values and then join the results back to the original table. To avoid division by 0, a where clause has been included. For these 0 sum cases, you might have to include a special condition to select them.

select t.a, t.b, 1.0 * t.c/t1.total_c
from tablename t 
join (select a, b, sum(c) as total_c
      from tablename
      group by a, b) t1
on t.a = t1.a and t.b = t1.b
where t1.total_c > 0
like image 2
Vamsi Prabhala Avatar answered Oct 20 '22 13:10

Vamsi Prabhala