Moving average or running mean

Explanation

The running mean is a case of the mathematical operation of convolution. For the running mean, you slide a window along the input and compute the mean of the window's contents. For discrete 1D signals, convolution is the same thing, except instead of the mean you compute an arbitrary linear combination, i.e., multiply each element by a corresponding coefficient and add up the results. Those coefficients, one for each position in the window, are sometimes called the convolution kernel. The arithmetic mean of N values is (x_1 + x_2 + ... + x_N) / N, so the corresponding kernel is (1/N, 1/N, ..., 1/N), and that's exactly what we get by using np.ones(N)/N.

Edges

The mode argument of np.convolve specifies how to handle the edges. I chose the valid mode here because I think that's how most people expect the running mean to work, but you may have other priorities. Here is a plot that illustrates the difference between the modes:

import numpy as np
import matplotlib.pyplot as plt
modes = ['full', 'same', 'valid']
for m in modes:
    plt.plot(np.convolve(np.ones(200), np.ones(50)/50, mode=m));
plt.axis([-10, 251, -.1, 1.1]);
plt.legend(modes, loc='lower center');
plt.show()

Running mean convolve modes

Efficient solution

Convolution is much better than straightforward approach, but (I guess) it uses FFT and thus quite slow. However specially for computing the running mean the following approach works fine

def running_mean(x, N):
    cumsum = numpy.cumsum(numpy.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / float(N)

The code to check

In[3]: x = numpy.random.random(100000)
In[4]: N = 1000
In[5]: %timeit result1 = numpy.convolve(x, numpy.ones((N,))/N, mode='valid')
10 loops, best of 3: 41.4 ms per loop
In[6]: %timeit result2 = running_mean(x, N)
1000 loops, best of 3: 1.04 ms per loop

Note that numpy.allclose(result1, result2) is True, two methods are equivalent. The greater N, the greater difference in time.

warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable

the comments pointed out this floating point error issue here but i am making it more obvious here in the answer..

# demonstrate loss of precision with only 100,000 points
np.random.seed(42)
x = np.random.randn(100000)+1e6
y1 = running_mean_convolve(x, 10)
y2 = running_mean_cumsum(x, 10)
assert np.allclose(y1, y2, rtol=1e-12, atol=0)

the more points you accumulate over the greater the floating point error (so 1e5 points is noticable, 1e6 points is more significant, more than 1e6 and you may want to resetting the accumulators)
you can cheat by using np.longdouble but your floating point error still will get significant for relatively large number of points (around >1e5 but depends on your data)
you can plot the error and see it increasing relatively fast
the convolve solution is slower but does not have this floating point loss of precision
the uniform_filter1d solution is faster than this cumsum solution AND does not have this floating point loss of precision

Update: The example below shows the old pandas.rolling_mean function which has been removed in recent versions of pandas. A modern equivalent of that function call would use pandas.Series.rolling:

In [8]: pd.Series(x).rolling(window=N).mean().iloc[N-1:].values
Out[8]: 
array([ 0.49815397,  0.49844183,  0.49840518, ...,  0.49488191,
        0.49456679,  0.49427121])

pandas is more suitable for this than NumPy or SciPy. Its function rolling_mean does the job conveniently. It also returns a NumPy array when the input is an array.

It is difficult to beat rolling_mean in performance with any custom pure Python implementation. Here is an example performance against two of the proposed solutions:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: def running_mean(x, N):
   ...:     cumsum = np.cumsum(np.insert(x, 0, 0)) 
   ...:     return (cumsum[N:] - cumsum[:-N]) / N
   ...:

In [4]: x = np.random.random(100000)

In [5]: N = 1000

In [6]: %timeit np.convolve(x, np.ones((N,))/N, mode='valid')
10 loops, best of 3: 172 ms per loop

In [7]: %timeit running_mean(x, N)
100 loops, best of 3: 6.72 ms per loop

In [8]: %timeit pd.rolling_mean(x, N)[N-1:]
100 loops, best of 3: 4.74 ms per loop

In [9]: np.allclose(pd.rolling_mean(x, N)[N-1:], running_mean(x, N))
Out[9]: True

There are also nice options as to how to deal with the edge values.

You can use scipy.ndimage.filters.uniform_filter1d:

import numpy as np
from scipy.ndimage.filters import uniform_filter1d
N = 1000
x = np.random.random(100000)
y = uniform_filter1d(x, size=N)

uniform_filter1d:

gives the output with the same numpy shape (i.e. number of points)
allows multiple ways to handle the border where 'reflect' is the default, but in my case, I rather wanted 'nearest'

It is also rather quick (nearly 50 times faster than np.convolve and 2-5 times faster than the cumsum approach given above):

%timeit y1 = np.convolve(x, np.ones((N,))/N, mode='same')
100 loops, best of 3: 9.28 ms per loop

%timeit y2 = uniform_filter1d(x, size=N)
10000 loops, best of 3: 191 µs per loop

here's 3 functions that let you compare error/speed of different implementations:

from __future__ import division
import numpy as np
import scipy.ndimage.filters as ndif
def running_mean_convolve(x, N):
    return np.convolve(x, np.ones(N) / float(N), 'valid')
def running_mean_cumsum(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0))
    return (cumsum[N:] - cumsum[:-N]) / float(N)
def running_mean_uniform_filter1d(x, N):
    return ndif.uniform_filter1d(x, N, mode='constant', origin=-(N//2))[:-(N-1)]

You can calculate a running mean with:

import numpy as np

def runningMean(x, N):
    y = np.zeros((len(x),))
    for ctr in range(len(x)):
         y[ctr] = np.sum(x[ctr:(ctr+N)])
    return y/N

But it's slow.

Fortunately, numpy includes a convolve function which we can use to speed things up. The running mean is equivalent to convolving x with a vector that is N long, with all members equal to 1/N. The numpy implementation of convolve includes the starting transient, so you have to remove the first N-1 points:

def runningMeanFast(x, N):
    return np.convolve(x, np.ones((N,))/N)[(N-1):]

On my machine, the fast version is 20-30 times faster, depending on the length of the input vector and size of the averaging window.

Note that convolve does include a 'same' mode which seems like it should address the starting transient issue, but it splits it between the beginning and end.

Related questions
                            
                                How do you programmatically set an attribute?
                            
                                How to sort with lambda in Python
                            
                                TypeError: got multiple values for argument
                            
                                Override Python's 'in' operator?
                            
                                Check if a number is int or float
                            
                                Transposing a 1D NumPy array
                            
                                Format a datetime into a string with milliseconds
                            
                                How do I format a string using a dictionary in python-3.x?
                            
                                Controlling mouse with Python
                            
                                matplotlib Legend Markers Only Once
                            
                                Why does Python's hash of infinity have the digits of π?
                            
                                How to use XPath in Python?
                            
                                proper name for python * operator?
                            
                                python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B
                            
                                Convert datetime to Unix timestamp and convert it back in python
                            
                                How can I convert a dictionary into a list of tuples?
                            
                                How to write very long string that conforms with PEP8 and prevent E501
                            
                                Rename specific column(s) in pandas
                            
                                How can I multiply all items in a list together with Python?
                            
                                "Could not run curl-config: [Errno 2] No such file or directory" when installing pycurl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Moving average or running mean

Tags:

python

numpy

scipy

moving-average

People also ask

Explanation

Edges

Efficient solution

warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable

Recent Activity

Donate For Us