Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing a prior sample while using Welford's method for computing single pass variance

I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post.

However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid and should be removed from the calculation. I know the value of the sample to remove and when it was processed. But I'm using Welford because I can not go back do another pass over all the data.

Is there an algorithm to successfully adjust my running variance to remove or negate a specific previously processed sample?

like image 823
Monospace Avatar asked Sep 27 '22 14:09

Monospace


1 Answers

Given the forward formulas

Mk = Mk-1 + (xk – Mk-1) / k
Sk = Sk-1 + (xk – Mk-1) * (xk – Mk),

it's possible to solve for Mk-1 as a function of Mk and xk and k:

Mk-1 = Mk - (xk - Mk) / (k - 1).

Then we can derive Sk-1 straightforwardly from Sk and the rest:

Sk-1 = Sk - (xk – Mk-1) * (xk – Mk).

It's not necessary that xk be the last sample here; since Mk and Sk theoretically do not depend on the order of the input, we can pretend that the sample to be removed was the last to be added.

I have no idea if this is stable.

like image 163
David Eisenstat Avatar answered Dec 14 '22 00:12

David Eisenstat