I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post.
However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid and should be removed from the calculation. I know the value of the sample to remove and when it was processed. But I'm using Welford because I can not go back do another pass over all the data.
Is there an algorithm to successfully adjust my running variance to remove or negate a specific previously processed sample?
Given the forward formulas
Mk = Mk-1 + (xk – Mk-1) / k
Sk = Sk-1 + (xk – Mk-1) * (xk – Mk),
it's possible to solve for Mk-1
as a function of Mk
and xk
and k
:
Mk-1 = Mk - (xk - Mk) / (k - 1).
Then we can derive Sk-1
straightforwardly from Sk
and the rest:
Sk-1 = Sk - (xk – Mk-1) * (xk – Mk).
It's not necessary that xk
be the last sample here; since Mk
and Sk
theoretically do not depend on the order of the input, we can pretend that the sample to be removed was the last to be added.
I have no idea if this is stable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With