Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing standard deviation over a circular buffer

I need to calculate the standard deviation of values are stored in a circular buffer. The final algorithm will run on a resource-constrained device, thus I want it to be as lightweight as possible. The naive approach would be to re-evaluate the standard deviation over the whole buffer each time that a new value is pushed in, but it would be really slow. Ideally, I'd like an algorithm that dynamically updates the current value of standard deviation as new values are pushed in.

Wikipedia reports some techniques for rapid calculation, but they can be used on streams: in my case when a new value is pushed in, the standard deviation should be calculated as if the last value that has been popped never existed.

tl;dr: how can I calculate standard deviation over a circular buffer with minimal computational effort?

like image 511
F. Cali Avatar asked Dec 17 '22 12:12

F. Cali


2 Answers

Standard deviation can be expressed as:

 stddev = sqrt(1/N * SUM(x[i]^2) - 1/N^2 * SUM(x[i])^2)

For the uncorrected sample standard deviation. For the corrected sample standard deviation one can write:

 stddev = sqrt(1/(N-1) * SUM(x[i]^2) - 1/(N^2-N) * SUM(x[i])^2)

If you maintain two accumulators, one for SUM(x[i]^2), and one for SUM(x[i]), then these are trivial to update for each new value (subtract the effect of the oldest value, and add in the effect of the newest value). N will of course be the length of your buffer.

Of course, if you are doing this in floating-point, then you will probably run into roundoff errors ((x + y) - x != y, in general). I don't think there's any easy fix for that.

like image 184
Oliver Charlesworth Avatar answered Dec 19 '22 00:12

Oliver Charlesworth


Keep three numbers for the set: The count of values, the sum of the values, and the sum of the squares of the values. Let's call these k, sn, and sn2 for convenience.

If, as I think you're implying, a new value always replaces an old value in the circular queue, then maybe the count is constant. Or maybe it's possible to have a less than full queue. Either way:

Every time you add a value to the queue, add one to the count, add this value to the sum, and add the square of this value to the square-sum. That is, if you add a new value "n", then k=k+1, sn=sn+n, and sn2=sn2+n^2.

Every time you remove a value from the queue, subtract one from the count, subtract this value from the sum, and subtract the square of this value from the square-sum. That is, if you remove a value "n", then k=k-1, sn=sn-n, and sn2=sn2-n^2.

You can then easily recalculate the standard deviation after every change without having to retotal everything.

Note this means that you do have to be able to "catch" a value before it is truly deleted.

Note: I suspect this is how my pocket calculator does it, because it has functions to dump sum(n) and sum(n^2), and I can "delete" a value that I never added, like I can say add 2 and 4 to the set, and then delete 3, and it say's okay fine. So I don't think it's keeping a list: it must just be keeping the count and the sums.

like image 38
Jay Avatar answered Dec 19 '22 01:12

Jay