Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute weighted sums on rolling window with pandas dataframes of different length

I have a large dataframe > 5000000 rows that I am performing a rolling calculation on.

df = pd.DataFrame(np.randn(10000,1), columns = ['rand'])
sum_abs = df.rolling(5).sum()

I would like to do the same calculations but add in a weighted sum.

df2 = pd.DataFrame(pd.Series([1,2,3,4,5]), name ='weight'))
df3 = df.mul(df2.set_index(df.index)).rolling(5).sum()

However, I am getting a Length Mismatch expected axis has 5 elements error. I know I could do something like [a *b for a, b in zip(L, weight)] if I converted everything to a list but I would like to keep it in a dataframe if possible. Is there a way to multiply against different size frames or do I need to repeat the set of numbers the length of the dataset I'm multiplying against?

like image 454
user3170242 Avatar asked Mar 31 '17 19:03

user3170242


1 Answers

Easy way to do this is

w = np.arange(1, 6)
df.rolling(5).apply(lambda x: (x * w).sum())

A less easy way using strides

from numpy.lib.stride_tricks import as_strided as strided 

v = df.values
n, m = v.shape
s1, s2 = v.strides
k = 5
w = np.arange(1, 6).reshape(1, 1, k)
pd.DataFrame(
    (strided(v, (n - k + 1, m, k), (s1, s2, s1)) * w).sum(-1),
    df.index[k - 1:], df.columns)

naive time test

enter image description here

like image 138
piRSquared Avatar answered Nov 18 '22 19:11

piRSquared