Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast numpy rolling_product

I need a rolling_product function, or an expanding_product function.

There are various pandas rolling_XXXX and expanding_XXXX functions, but I was surprised to discover the absence of an expanding_product() function.

To get things working I've been using this rather slow alternative

pd.expanding_apply(temp_col, lambda x : x.prod())

My arrays often have 32,000 elements so this is proving to be a bit of a bottleneck. I was tempted to try log(), cumsum(), and exp(), but I thought I should ask on here since there might be a much better solution.

like image 360
JasonEdinburgh Avatar asked May 21 '15 21:05

JasonEdinburgh


2 Answers

I have a faster mechanism, though you'll need to run some tests to see if the accuracy is sufficient.

Here's the original exp/sum/log version:

def rolling_prod1(xs, n):
    return np.exp(pd.rolling_sum(np.log(xs), n))

And here's a version that takes the cumulative product, shifts it over (pre-filling with nans), and then divides it back out.

def rolling_prod2(xs, n):
    cxs = np.cumprod(xs)
    nans = np.empty(n)
    nans[:] = np.nan
    nans[n-1] = 1.
    a = np.concatenate((nans, cxs[:len(cxs)-n]))
    return cxs / a

Both functions return the same result for this example:

In [9]: xs
Out[9]: array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [10]: rolling_prod1(xs, 3)
Out[10]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

In [11]: rolling_prod2(xs, 3)
Out[11]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

But the second version is much faster:

In [12]: temp_col = np.random.rand(30000)

In [13]: %timeit rolling_prod1(temp_col, 3)
1000 loops, best of 3: 694 µs per loop

In [14]: %timeit rolling_prod2(temp_col, 3)
10000 loops, best of 3: 162 µs per loop
like image 96
chrisaycock Avatar answered Oct 16 '22 18:10

chrisaycock


Early results show that this is a fast-ish approximation for expanding_product

np.exp(pd.expanding_sum(np.log(temp_col)))

rolling_product would require repeated divisions which could lead to numerical instabilities (as pointed out by @AmiTavory in a now-deleted answer)

like image 25
JasonEdinburgh Avatar answered Oct 16 '22 16:10

JasonEdinburgh