I need a rolling_product function, or an expanding_product function.
There are various pandas
rolling_XXXX
and expanding_XXXX
functions, but I was surprised to discover the absence of an expanding_product()
function.
To get things working I've been using this rather slow alternative
pd.expanding_apply(temp_col, lambda x : x.prod())
My arrays often have 32,000 elements so this is proving to be a bit of a bottleneck. I was tempted to try log()
, cumsum()
, and exp()
, but I thought I should ask on here since there might be a much better solution.
I have a faster mechanism, though you'll need to run some tests to see if the accuracy is sufficient.
Here's the original exp/sum/log version:
def rolling_prod1(xs, n):
return np.exp(pd.rolling_sum(np.log(xs), n))
And here's a version that takes the cumulative product, shifts it over (pre-filling with nans), and then divides it back out.
def rolling_prod2(xs, n):
cxs = np.cumprod(xs)
nans = np.empty(n)
nans[:] = np.nan
nans[n-1] = 1.
a = np.concatenate((nans, cxs[:len(cxs)-n]))
return cxs / a
Both functions return the same result for this example:
In [9]: xs
Out[9]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
In [10]: rolling_prod1(xs, 3)
Out[10]: array([ nan, nan, 6., 24., 60., 120., 210., 336., 504.])
In [11]: rolling_prod2(xs, 3)
Out[11]: array([ nan, nan, 6., 24., 60., 120., 210., 336., 504.])
But the second version is much faster:
In [12]: temp_col = np.random.rand(30000)
In [13]: %timeit rolling_prod1(temp_col, 3)
1000 loops, best of 3: 694 µs per loop
In [14]: %timeit rolling_prod2(temp_col, 3)
10000 loops, best of 3: 162 µs per loop
Early results show that this is a fast-ish approximation for expanding_product
np.exp(pd.expanding_sum(np.log(temp_col)))
rolling_product would require repeated divisions which could lead to numerical instabilities (as pointed out by @AmiTavory in a now-deleted answer)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With