I will need your help with this one. I want to normalize numbers within groups. Say I have this dataset:
A B C
-------
0 1 3
1 1 0
1 1 2
1 1 4
1 2 1
1 2 1
I want to group these rows by A and B, then normalize the values of C within its group, i.e. summing up all C's of the group and then dividing each C by that sum. In the above example, I would expect this result:
A B C
---------
0 1 1 // sum of C's in group is 1, 1/1=1
1 1 0 // sum of C's in group is 6, 0/6=0
1 1 1/3 // ...
1 1 2/3 // ...
1 2 1/2 // sum of C's in group is 2, 1/2=1/2
1 2 1/2 // ...
Division by zero can be handled separately. How to do this using SQL (or PSQL, if that helps)? I can think of ways to do this in principle, but I always end up with deeply nested SELECTs, which I want to avoid.
Thanks in advance!
You could use windowed functions for it:
SELECT a,b,
1.0 * c / CASE WHEN SUM(c) OVER(PARTITION BY a,b) = 0 THEN 1
ELSE SUM(c) OVER(PARTITION BY a,b) END AS c
FROM tab
SqlFiddleDemo
Output:
╔════╦════╦════════════════════╗
║ a ║ b ║ c ║
╠════╬════╬════════════════════╣
║ 0 ║ 1 ║ 1 ║
║ 1 ║ 1 ║ 0 ║
║ 1 ║ 1 ║ 0.3333333333333333 ║
║ 1 ║ 1 ║ 0.6666666666666666 ║
║ 1 ║ 2 ║ 0.5 ║
║ 1 ║ 2 ║ 0.5 ║
║ 2 ║ 2 ║ 0 ║ -- added for check division by 0
╚════╩════╩════════════════════╝
You can use a derived table to aggregate the values and then join
the results back to the original table. To avoid division by 0
, a where
clause has been included. For these 0
sum cases, you might have to include a special condition to select them.
select t.a, t.b, 1.0 * t.c/t1.total_c
from tablename t
join (select a, b, sum(c) as total_c
from tablename
group by a, b) t1
on t.a = t1.a and t.b = t1.b
where t1.total_c > 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With