Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate moving average in numpy array with NaNs

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:

import numpy as np

def moving_average(a,n=5):
      ret = np.cumsum(a,dtype=float)
      ret[n:] = ret[n:]-ret[:-n]
      return ret[-1:]/n

When calculating with a masked array:

x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx).filled(np.nan)

print y

>>> array([3.8,3.8,3.6,nan,nan,nan,2,2.4,nan,nan,nan,2.8,2.6])

The result I am looking for (below) should ideally have NaNs only in the place where the original array, x, had NaNs and the averaging should be done over the number of non-NaN elements in the grouping (I need some way to change the size of n in the function.)

y = array([4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25,4,4.5,3])

I could loop over the entire array and check index by index but the array I am using is very large and that would take a long time. Is there a numpythonic way to do this?

like image 780
krakenwagon Avatar asked Oct 29 '22 19:10

krakenwagon


2 Answers

Pandas has a lot of really nice functionality with this. For example:

x = np.array([np.nan, np.nan, 3, 3, 3, np.nan, 5, 7, 7])

# requires three valid values in a row or the resulting value is null

print(pd.Series(x).rolling(3).mean())

#output
nan,nan,nan, nan, 3, nan, nan, nan, 6.333

# only requires 2 valid values out of three for size=3 window

print(pd.Series(x).rolling(3, min_periods=2).mean())

#output
nan, nan, nan, 3, 3, 3, 4, 6, 6.3333

You can play around with the windows/min_periods and consider filling-in nulls all in one chained line of code.

like image 68
slevin886 Avatar answered Nov 12 '22 21:11

slevin886


I'll just add to the great answers before that you could still use cumsum to achieve this:

import numpy as np

def moving_average(a, n=5):
    ret = np.cumsum(a.filled(0))
    ret[n:] = ret[n:] - ret[:-n]
    counts = np.cumsum(~a.mask)
    counts[n:] = counts[n:] - counts[:-n]
    ret[~a.mask] /= counts[~a.mask]
    ret[a.mask] = np.nan

    return ret

x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx)
like image 29
jerry Avatar answered Nov 12 '22 23:11

jerry