Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recurrence relation in Pandas

Tags:

I have a DataFrame, df, in pandas with series df.A and df.B and am trying to create a third series, df.C that is dependent on A and B as well as the previous result. That is:

C[0]=A[0]

C[n]=A[n] + B[n]*C[n-1]

what is the most efficient way of doing this? Ideally, I wouldn't have to fall back to a for loop.


Edit

This is the desired output for C given A and B. Now just need to figure out how...

import pandas as pd

a = [ 2, 3,-8,-2, 1]
b = [ 1, 1, 4, 2, 1]
c = [ 2, 5,12,22,23]

df = pd.DataFrame({'A': a, 'B': b, 'C': c})
df
like image 308
Big AL Avatar asked Mar 18 '18 09:03

Big AL


1 Answers

You can vectorize this with obnoxious cumulative products and zipping together of other vectors. But it won't end up saving you time. As a matter of fact, it will likely be numerically unstable.

Instead, you can use numba to speed up your loop.

from numba import njit
import numpy as np
import pandas as pd

@njit
def dynamic_alpha(a, b):
    c = a.copy()
    for i in range(1, len(a)):
        c[i] = a[i] + b[i] * c[i - 1]
    return c

df.assign(C=dynamic_alpha(df.A.values, df.B.values))

   A  B   C
0  2  1   2
1  3  1   5
2 -8  4  12
3 -2  2  22
4  1  1  23

For this simple calculation, this will be about as fast as a simple

df.assign(C=np.arange(len(df)) ** 2 + 2)

df = pd.concat([df] * 10000)
%timeit df.assign(C=dynamic_alpha(df.A.values, df.B.values))
%timeit df.assign(C=np.arange(len(df)) ** 2 + 2)

337 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
333 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 161
piRSquared Avatar answered Sep 20 '22 12:09

piRSquared