Using Python, assume I'm running through a known quantity of items <code>I</code>, and have the ability to time how long it takes to process each one <code>t</code>, as well as a running total of time spent processing <code>T</code> and the number of items processed so far <code>c</code>. I'm currently calculating the average on the fly <code>A = T / c</code> but this can be skewed by say a single item taking an extraordinarily long time to process (a few seconds compared to a few milliseconds). I would like to show a running Standard Deviation. How can I do this without keeping a record of each <code>t</code>?

As outlined in the Wikipedia article on the standard deviation, it is enough to keep track of the following three sums: <pre class="prettyprint"><code>s0 = sum(1 for x in samples) s1 = sum(x for x in samples) s2 = sum(x*x for x in samples) </code></pre> These sums are easily updated as new values arrive. The standard deviation can be calculated as <pre class="prettyprint"><code>std_dev = math.sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1))) </code></pre> Note that this way of computing the standard deviation can be numerically ill-conditioned if your samples are floating point numbers and the standard deviation is small compared to the mean of the samples. If you expect samples of this type, you should resort to Welford's method (see the accepted answer).

Computing Standard Deviation in a stream

Tags:

python

math

Using Python, assume I'm running through a known quantity of items I, and have the ability to time how long it takes to process each one t, as well as a running total of time spent processing T and the number of items processed so far c. I'm currently calculating the average on the fly A = T / c but this can be skewed by say a single item taking an extraordinarily long time to process (a few seconds compared to a few milliseconds).

I would like to show a running Standard Deviation. How can I do this without keeping a record of each t?

486

asked Apr 04 '11 20:04

Josh K

2 Answers

As outlined in the Wikipedia article on the standard deviation, it is enough to keep track of the following three sums:

s0 = sum(1 for x in samples) s1 = sum(x for x in samples) s2 = sum(x*x for x in samples)

These sums are easily updated as new values arrive. The standard deviation can be calculated as

std_dev = math.sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1)))

Note that this way of computing the standard deviation can be numerically ill-conditioned if your samples are floating point numbers and the standard deviation is small compared to the mean of the samples. If you expect samples of this type, you should resort to Welford's method (see the accepted answer).

193

answered Oct 04 '22 01:10

Sven Marnach

Based on Welford's algorithm:

import numpy as np  class OnlineVariance(object):     """     Welford's algorithm computes the sample variance incrementally.     """      def __init__(self, iterable=None, ddof=1):         self.ddof, self.n, self.mean, self.M2 = ddof, 0, 0.0, 0.0         if iterable is not None:             for datum in iterable:                 self.include(datum)      def include(self, datum):         self.n += 1         self.delta = datum - self.mean         self.mean += self.delta / self.n         self.M2 += self.delta * (datum - self.mean)      @property     def variance(self):         return self.M2 / (self.n - self.ddof)      @property     def std(self):         return np.sqrt(self.variance)

Update the variance with each new piece of data:

N = 100 data = np.random.random(N) ov = OnlineVariance(ddof=0) for d in data:     ov.include(d) std = ov.std print(std)

Check our result against the standard deviation computed by numpy:

assert np.allclose(std, data.std())

answered Oct 03 '22 23:10

unutbu

Related questions
                            
                                An elegant and fast way to consecutively iterate over two or more containers in Python?
                            
                                How do I create my own NLTK text from a text file?
                            
                                ImportError: No module named PytQt5
                            
                                Check if a process is running using Python on Linux [duplicate]
                            
                                Parsing apache log files
                            
                                Python simple naked objects
                            
                                Python sorting by multiple criteria
                            
                                Transpose column to row with Spark
                            
                                Python OpenCV Image to byte string for json transfer
                            
                                counting combinations and permutations efficiently
                            
                                Testing floating point equality
                            
                                ImportError: No module named 'pymongo'
                            
                                Xticks by pandas plot, rename with the string
                            
                                No module named builtins
                            
                                pkg_resources.DistributionNotFound: The 'pipenv==2018.10.13' distribution was not found and is required by the application
                            
                                What are the advantages or difference in “assert False” and “self.assertFalse”
                            
                                rotating coordinate system via a quaternion
                            
                                Does PEP 8 require whitespace around operators in function arguments?
                            
                                How do I use line_profiler (from Robert Kern)?
                            
                                How to understand the term `tensor` in TensorFlow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With