I have data sampled at essentially random intervals. I would like to compute a weighted moving average using numpy (or other python package). I have a crude implementation of a moving average, but I am having trouble finding a good way to do a weighted moving average, so that the values towards the center of the bin are weighted more than values towards the edges. Here I generate some sample data and then take a moving average. How can I most easily implement a weighted moving average? Thanks! <pre class="prettyprint"><code>import numpy as np import matplotlib.pyplot as plt #first generate some datapoint for a randomly sampled noisy sinewave x = np.random.random(1000)*10 noise = np.random.normal(scale=0.3,size=len(x)) y = np.sin(x) + noise #plot the data plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data') plt.xlabel('Time') plt.ylabel('Intensity') #define a moving average function def moving_average(x,y,step_size=.1,bin_size=1): bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size bin_avg = np.zeros(len(bin_centers)) for index in range(0,len(bin_centers)): bin_center = bin_centers[index] items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))] bin_avg[index] = np.mean(items_in_bin) return bin_centers,bin_avg #plot the moving average bins, average = moving_average(x,y) plt.plot(bins, average,label='moving average') plt.show() </code></pre> The output: <img src="https://i.stack.imgur.com/ca0kN.png" alt="Data and moving average"> Using the advice from crs17 to use "weights=" in the np.average function, I came up weighted average function, which uses a Gaussian function to weight the data: <pre class="prettyprint"><code>def weighted_moving_average(x,y,step_size=0.05,width=1): bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size bin_avg = np.zeros(len(bin_centers)) #We're going to weight with a Gaussian function def gaussian(x,amp=1,mean=0,sigma=1): return amp*np.exp(-(x-mean)**2/(2*sigma**2)) for index in range(0,len(bin_centers)): bin_center = bin_centers[index] weights = gaussian(x,mean=bin_center,sigma=width) bin_avg[index] = np.average(y,weights=weights) return (bin_centers,bin_avg) </code></pre> Results look good: <img src="https://i.stack.imgur.com/SV7wj.png" alt="Working weighted average using numpy">

You could use numpy.average which allows you to specify weights: <pre class="prettyprint"><code>>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights) </code></pre> So to calculate the weights you could find the x coordinates of each data point in the bin and calculate their distances to the bin center.

Weighted moving average in python

Tags:

python

pandas

numpy

scipy

I have data sampled at essentially random intervals. I would like to compute a weighted moving average using numpy (or other python package). I have a crude implementation of a moving average, but I am having trouble finding a good way to do a weighted moving average, so that the values towards the center of the bin are weighted more than values towards the edges.

Here I generate some sample data and then take a moving average. How can I most easily implement a weighted moving average? Thanks!

import numpy as np
import matplotlib.pyplot as plt

#first generate some datapoint for a randomly sampled noisy sinewave
x = np.random.random(1000)*10
noise = np.random.normal(scale=0.3,size=len(x))
y = np.sin(x) + noise

#plot the data
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data')
plt.xlabel('Time')
plt.ylabel('Intensity')

#define a moving average function
def moving_average(x,y,step_size=.1,bin_size=1):
    bin_centers  = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
    bin_avg = np.zeros(len(bin_centers))

    for index in range(0,len(bin_centers)):
        bin_center = bin_centers[index]
        items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))]
        bin_avg[index] = np.mean(items_in_bin)

    return bin_centers,bin_avg

#plot the moving average
bins, average = moving_average(x,y)
plt.plot(bins, average,label='moving average')

plt.show()

The output: Data and moving average

Using the advice from crs17 to use "weights=" in the np.average function, I came up weighted average function, which uses a Gaussian function to weight the data:

def weighted_moving_average(x,y,step_size=0.05,width=1):
    bin_centers  = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
    bin_avg = np.zeros(len(bin_centers))

    #We're going to weight with a Gaussian function
    def gaussian(x,amp=1,mean=0,sigma=1):
        return amp*np.exp(-(x-mean)**2/(2*sigma**2))

    for index in range(0,len(bin_centers)):
        bin_center = bin_centers[index]
        weights = gaussian(x,mean=bin_center,sigma=width)
        bin_avg[index] = np.average(y,weights=weights)

    return (bin_centers,bin_avg)

Results look good: Working weighted average using numpy

266

asked Aug 29 '13 17:08

DanHickstein

2 Answers

You could use numpy.average which allows you to specify weights:

>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights)

So to calculate the weights you could find the x coordinates of each data point in the bin and calculate their distances to the bin center.

answered Oct 22 '22 20:10

crs17

This won't give an exact solution, but it will make your life easier, and will probably be good enough... First, average your samples in small bins. Once you have resampled your data to be equispaced, you can use stride tricks and np.average to do a weighted average:

from numpy.lib.stride_tricks import as_strided

def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10,
                            weights=None):
    # This ensures that all samples are within a bin
    number_of_bins = int(np.ceil(np.ptp(x) / step_size))
    bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins,
                       num=number_of_bins+1)
    bins -= (bins[-1] - np.max(x)) / 2
    bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2

    counts, _ = np.histogram(x, bins=bins)
    vals, _ = np.histogram(x, bins=bins, weights=y)
    bin_avgs = vals / counts
    n = len(bin_avgs)
    windowed_bin_avgs = as_strided(bin_avgs,
                                   (n-steps_per_bin+1, steps_per_bin),
                                   bin_avgs.strides*2)

    weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights)

    return bin_centers, weighted_average

You can now do something like this:

#plot the moving average with triangular weights
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1]))
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights),
                                        weights=weights)
plt.plot(bins, average,label='moving average')

plt.show()

enter image description here

answered Oct 22 '22 19:10

Jaime

Related questions
                            
                                How to get access of individual trees of a xgboost model in python /R
                            
                                Sklearn Pipeline - How to inherit get_params in custom Transformer (not Estimator)
                            
                                Alexa request validation in python
                            
                                Exit pdb Interactive Mode from Jupyter Notebook
                            
                                Transfer learning with tf.estimator.Estimator framework
                            
                                Twine upload TypeError: expected string or bytes-like object
                            
                                How to check if a function was called in a unit test using pytest-mock?
                            
                                Can I find out the allocation request that caused my Python MemoryError?
                            
                                Accessing neighboring cells for numpy array
                            
                                Prevent coercion of pandas data frames while indexing and inserting rows
                            
                                Best way to abstract season/show/episode data
                            
                                Why is host aborting connection?
                            
                                Difference Between Modulus Implementation in Python Vs Java
                            
                                How to join the same table in sqlalchemy
                            
                                Can you change a widget's parent in python tkinter?
                            
                                Interesting "getElementById() takes exactly 1 argument (2 given)", sometimes it occurs. Can someone explain it?
                            
                                Python - Why does extend() and append() return None (void)? [duplicate]
                            
                                Matplotlib - Stepped histogram with already binned data
                            
                                Paging output from print statement
                            
                                Using SignalR server from Python code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With