Have a relatively simple block of code that loops through two arrays, multiplies, and adds cumulatively:
import numpy as np
a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
c = []
d = 0
for i, val in enumerate(a):
d += val
c.append(d)
d *= b[i]
Is there a way to do this without iterating? I imagine cumsum/cumprod could be used but I'm having trouble figuring out how. When you break down what's happening step by step, it looks like this:
# 0: 0 + a[0]
# 1: ((0 + a[0]) * b[0]) + a[1]
# 2: ((((0 + a[0]) * b[0]) + a[1]) * b[1]) + a[2]
Edit for clarification: Am interested in the list (or array) c.
In each iteration, you have -
d[n+1] = d[n] + a[n]
d[n+1] = d[n+1] * b[n]
Thus, essentially -
d[n+1] = (d[n] + a[n]) * b[n]
i.e. -
d[n+1] = (d[n]* b[n]) + K[n] #where `K[n] = a[n] * b[n]`
Now, using this formula if you write down the expressions for until n = 2
cases, you would have -
d[1] = d[0]*b[0] + K[0]
d[2] = d[0]*b[0]*b[1] + K[0]*b[1] + K[1]
d[3] = d[0]*b[0]*b[1]*b[2] + K[0]*b[1]*b[2] + K[1]*b[2] + K[2]
Scalars : b[0]*b[1]*b[2] b[1]*b[2] b[2] 1
Coefficients : d[0] K[0] K[1] K[2]
Thus, you would need reversed cumprod of b
, perform elementwise multiplication with K
array. Finally, to get c
, perform cumsum
and since c
is stored before scaling down by b
, so you would need to scale down the cumsum
version by the reversed cumprod of b
.
The final implementation would look like this -
# Get reversed cumprod of b and pad with `1` at the end
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
# Get K
K = a*b
# Append with 0 at the start, corresponding starting d
K_ext = np.hstack((0,K))
# Perform elementwsie multiplication and cumsum and scale down for final c
sums = (B*K_ext).cumsum()
c = sums[1:]/b_rev_cumprod
Runtime tests and verify output
Function definitions -
def original_approach(a,b):
c = []
d = 0
for i, val in enumerate(a):
d = d+val
c.append(d)
d = d*b[i]
return c
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
Runtimes and verification
Case #1: OP Sample case
In [301]: # Inputs
...: a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
...: b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
...:
In [302]: original_approach(a,b)
Out[302]:
[1,
2.0099999999999998,
4.4020000000000001,
6.1320600000000001,
7.6132059999999999,
8.7613205999999995,
14.256792359999999,
18.128396179999999]
In [303]: vectorized_approach(a,b)
Out[303]:
array([ 1. , 2.01 , 4.402 , 6.13206 ,
7.613206 , 8.7613206 , 14.25679236, 18.12839618])
Case #2: Large input case
In [304]: # Inputs
...: N = 1000
...: a = np.random.randint(0,100000,N)
...: b = np.random.rand(N)+0.1
...:
In [305]: np.allclose(original_approach(a,b),vectorized_approach(a,b))
Out[305]: True
In [306]: %timeit original_approach(a,b)
1000 loops, best of 3: 746 µs per loop
In [307]: %timeit vectorized_approach(a,b)
10000 loops, best of 3: 76.9 µs per loop
Please be mindful that for extremely huge input array cases if the b
elements are such small fractions, because of cummulative operations, the initial numbers of b_rev_cumprod
might come out as zeros
resulting in NaNs
in those initial places.
Let's see if we can get even faster. I am now leaving the pure python world and show that this purely numeric problems can be optimized even further.
The two players are @Divakar's fast vectorized version:
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
and a cython version:
%%cython
import numpy as np
def cython_approach(long[:] a, double[:] b):
cdef double d
cdef size_t i, n
n = a.shape[0]
cdef double[:] c = np.empty(n)
d = 0
for i in range(n):
d += a[i]
c[i] = d
d *= b[i]
return c
The cython version is about 5x faster than the vectorized version:
%timeit vectorized_approach(a,b)
-> 10000 loops, best of 3: 43.4 µs per loop
%timeit cython_approach(a,b)
-> 100000 loops, best of 3: 7.7 µs per loop
Another plus of the cython version is that it is much more readable.
The big downside is that you are leaving pure python and depending on your use case compiling an extension module may not be an option for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With