Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating the respective z-score of several columns

Tags:

sql

postgresql

I'm using a SQL query to determine the z-score (x - μ / σ) of several columns.

In particular, I have a table like the following:

my_table
id    col_a  col_b  col_c
1     3      6      5
2     5      3      3
3     2      2      9
4     9      8      2

...and I want to select the z-score of every number of every row, according to the average and standard deviation of its column.

So the result would look like this:

id    col_d     col_e     col_f
1    -0.4343    1.0203    ...
2     0.1434   -0.8729
3    -0.8234   -1.2323
4     1.889     1.5343

Currently my code computes the score for two columns and looks like this:

select id,
   (my_table.col_a - avg(mya.col_a)) / stddev(mya.col_a) as col_d,
   (my_table.col_b - avg(myb.col_b)) / stddev(myb.col_b) as col_e, 
from my_table,
select col_a from my_table)mya,
select col_b from my_table)myb
group by id;

However, this is extremely slow. I've been waiting minutes for a three column query.

Is there a better way to accomplish this? I'm using postgres but any general language will help me. Thanks!

like image 445
dmc7z Avatar asked Oct 09 '13 18:10

dmc7z


People also ask

How do you find the z-score for grouped data?

The mean is the average of all values in a group, added together, and then divided by the total number of items in the group. To calculate the Z-score, subtract the mean from each of the individual data points and divide the result by the standard deviation.

How do you find z-score in SQL?

The Z-score normalized value equals (the original column value minus the mean column value) divided by the standard deviation of the column values. We will use T-SQL's AVG() function to calculate the mean and the STDEV() function to calculate the standard deviation.

How do you find the z-score of a list?

How do you find the z-score with mean and standard deviation? If you know the mean and standard deviation, you can find z-score using the formula z = (x - μ) / σ where x is your data point, μ is the mean, and σ is the standard deviation.


1 Answers

you can use window functions like this:

select
    t.id,
    (t.col_a - avg(t.col_a) over()) / stdev(t.col_a) over() as col_d,
    (t.col_b - avg(t.col_b) over()) / stdev(t.col_b) over() as col_e
from my_table as t

or cross join with precalculated avg and stdev:

select
    t.id,
    (t.col_a - tt.col_a_avg) / tt.col_a_stdev as col_d,
    (t.col_b - tt.col_b_avg) / tt.col_b_stdev as col_e
from my_table as t
    cross join (
        select 
            avg(tt.col_a) as col_a_avg,
            avg(tt.col_b) as col_b_avg,
            stdev(tt.col_a) as col_a_stdev,
            stdev(tt.col_b) as col_b_stdev
        from my_table as tt
   ) as tt
like image 71
Roman Pekar Avatar answered Sep 23 '22 14:09

Roman Pekar