I have a DataFrame, df
, in pandas with series df.A
and df.B
and am trying to create a third series, df.C
that is dependent on A and B as well as the previous result. That is:
C[0]=A[0]
C[n]=A[n] + B[n]*C[n-1]
what is the most efficient way of doing this? Ideally, I wouldn't have to fall back to a for
loop.
Edit
This is the desired output for C given A and B. Now just need to figure out how...
import pandas as pd
a = [ 2, 3,-8,-2, 1]
b = [ 1, 1, 4, 2, 1]
c = [ 2, 5,12,22,23]
df = pd.DataFrame({'A': a, 'B': b, 'C': c})
df
You can vectorize this with obnoxious cumulative products and zipping together of other vectors. But it won't end up saving you time. As a matter of fact, it will likely be numerically unstable.
Instead, you can use numba
to speed up your loop.
from numba import njit
import numpy as np
import pandas as pd
@njit
def dynamic_alpha(a, b):
c = a.copy()
for i in range(1, len(a)):
c[i] = a[i] + b[i] * c[i - 1]
return c
df.assign(C=dynamic_alpha(df.A.values, df.B.values))
A B C
0 2 1 2
1 3 1 5
2 -8 4 12
3 -2 2 22
4 1 1 23
For this simple calculation, this will be about as fast as a simple
df.assign(C=np.arange(len(df)) ** 2 + 2)
df = pd.concat([df] * 10000)
%timeit df.assign(C=dynamic_alpha(df.A.values, df.B.values))
%timeit df.assign(C=np.arange(len(df)) ** 2 + 2)
337 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 333 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With