I have data sampled at essentially random intervals. I would like to compute a weighted moving average using numpy (or other python package). I have a crude implementation of a moving average, but I am having trouble finding a good way to do a weighted moving average, so that the values towards the center of the bin are weighted more than values towards the edges.
Here I generate some sample data and then take a moving average. How can I most easily implement a weighted moving average? Thanks!
import numpy as np
import matplotlib.pyplot as plt
#first generate some datapoint for a randomly sampled noisy sinewave
x = np.random.random(1000)*10
noise = np.random.normal(scale=0.3,size=len(x))
y = np.sin(x) + noise
#plot the data
plt.plot(x,y,'ro',alpha=0.3,ms=4,label='data')
plt.xlabel('Time')
plt.ylabel('Intensity')
#define a moving average function
def moving_average(x,y,step_size=.1,bin_size=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
items_in_bin = y[(x>(bin_center-bin_size*0.5) ) & (x<(bin_center+bin_size*0.5))]
bin_avg[index] = np.mean(items_in_bin)
return bin_centers,bin_avg
#plot the moving average
bins, average = moving_average(x,y)
plt.plot(bins, average,label='moving average')
plt.show()
The output:
Using the advice from crs17 to use "weights=" in the np.average function, I came up weighted average function, which uses a Gaussian function to weight the data:
def weighted_moving_average(x,y,step_size=0.05,width=1):
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=width)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
Results look good:
Syntax: def weighted_average(dataframe, value, weight): val = dataframe[value] wt = dataframe[weight] return (val * wt).sum() / wt.sum() It will return the weighted average of the item in value. In the numerator, we multiply each value with the corresponding weight associated and add them all.
Method 1: Using Numpy It provides a method called numpy. cumsum() which returns the array of the cumulative sum of elements of the given array. A moving average can be calculated by dividing the cumulative sum of elements by window size.
Weighted Moving Average (WMA) A Weighted Moving Average puts more weight on recent data and less on past data. This is done by multiplying each bar's price by a weighting factor. Because of its unique calculation, WMA will follow prices more closely than a corresponding Simple Moving Average.
To calculate exponential weights moving averages in Python, we can use the pandas ewm() function.
You could use numpy.average which allows you to specify weights:
>>> bin_avg[index] = np.average(items_in_bin, weights=my_weights)
So to calculate the weights you could find the x coordinates of each data point in the bin and calculate their distances to the bin center.
This won't give an exact solution, but it will make your life easier, and will probably be good enough... First, average your samples in small bins. Once you have resampled your data to be equispaced, you can use stride tricks and np.average
to do a weighted average:
from numpy.lib.stride_tricks import as_strided
def moving_weighted_average(x, y, step_size=.1, steps_per_bin=10,
weights=None):
# This ensures that all samples are within a bin
number_of_bins = int(np.ceil(np.ptp(x) / step_size))
bins = np.linspace(np.min(x), np.min(x) + step_size*number_of_bins,
num=number_of_bins+1)
bins -= (bins[-1] - np.max(x)) / 2
bin_centers = bins[:-steps_per_bin] + step_size*steps_per_bin/2
counts, _ = np.histogram(x, bins=bins)
vals, _ = np.histogram(x, bins=bins, weights=y)
bin_avgs = vals / counts
n = len(bin_avgs)
windowed_bin_avgs = as_strided(bin_avgs,
(n-steps_per_bin+1, steps_per_bin),
bin_avgs.strides*2)
weighted_average = np.average(windowed_bin_avgs, axis=1, weights=weights)
return bin_centers, weighted_average
You can now do something like this:
#plot the moving average with triangular weights
weights = np.concatenate((np.arange(0, 5), np.arange(0, 5)[::-1]))
bins, average = moving_weighted_average(x, y, steps_per_bin=len(weights),
weights=weights)
plt.plot(bins, average,label='moving average')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With