Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

weighted standard deviation in sql server without aggregation error

Redoing the weighted mean (which is already in another column) in working out the weighted-Sum-Of-Squared-Deviations, results in the error "Cannot perform an aggregate function on an expression containing an aggregate or a subquery."

SQRT(SUM(wt.value*SQUARE(out1.value-(SUM(ISNULL(out1.value,0)*wt.value)/SUM(wt.value))))
    / (((COUNT(wt.value)-1)*SUM(wt.value))/(COUNT(wt.value)))) as wsdevInc0

Can you suggest how to calculate the WEIGHTED STANDARD DEVIATION in SQL, short of adding another level of 'SELECT' to the query hierarchy (and probably duplicating the selection of data)?

like image 797
user1444275 Avatar asked Jan 16 '23 11:01

user1444275


1 Answers

I just encountered this same problem and error message. The way to solve it is to rewrite the weighted standard deviation formula.

The population variance formula, using <>'s to denote an average, which also applies to a weighted average, is:

variance = <(x - <x>)^2>

This form contains an aggregate function embedded in another aggregate function, hence the SQL error message.

The solution is to rewrite the formula without nested aggregations:

variance = <x^2> - <x>^2

This directly follow by multiplying out the previous formula, and noting that <x<x>> = <x><x>.

The standard deviation is just the square root of the variance, so the SQL becomes:

SQRT( SUM(w*SQUARE(x))/SUM(w) - SQUARE(SUM(w*x)/SUM(w)) )

where w is the weight, and x is the statistical variable.

The above formulas are applicable to a population data set. For a sample data set, multiply the above SQL value by

SQRT( COUNT(*)/(COUNT(*)-1) )
like image 144
Matt Avatar answered Feb 01 '23 09:02

Matt