I'm using a SQL query to determine the z-score (x - μ / σ) of several columns.
In particular, I have a table like the following:
my_table
id col_a col_b col_c
1 3 6 5
2 5 3 3
3 2 2 9
4 9 8 2
...and I want to select the z-score of every number of every row, according to the average and standard deviation of its column.
So the result would look like this:
id col_d col_e col_f
1 -0.4343 1.0203 ...
2 0.1434 -0.8729
3 -0.8234 -1.2323
4 1.889 1.5343
Currently my code computes the score for two columns and looks like this:
select id,
(my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
(my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e,
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;
However, this is extremely slow. I've been waiting minutes for a three column query.
Is there a better way to accomplish this? I'm using postgres but any general language will help me. Thanks!
The mean is the average of all values in a group, added together, and then divided by the total number of items in the group. To calculate the Z-score, subtract the mean from each of the individual data points and divide the result by the standard deviation.
The Z-score normalized value equals (the original column value minus the mean column value) divided by the standard deviation of the column values. We will use T-SQL's AVG() function to calculate the mean and the STDEV() function to calculate the standard deviation.
How do you find the z-score with mean and standard deviation? If you know the mean and standard deviation, you can find z-score using the formula z = (x - μ) / σ where x is your data point, μ is the mean, and σ is the standard deviation.
you can use window functions like this:
select
t.id,
(t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
(t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t
or cross join with precalculated avg
and stdev
:
select
t.id,
(t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
(t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
cross join (
select
avg(tt.col_a) as col_a_avg,
avg(tt.col_b) as col_b_avg,
stdev(tt.col_a) as col_a_stdev,
stdev(tt.col_b) as col_b_stdev
from my_table as tt
) as tt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With