Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: building a column with self-referencing past values

I need to generate a column that starts with an initial value, and then is generated by a function that includes past values of that column. For example

df = pd.DataFrame({'a': [1,1,5,2,7,8,16,16,16]})
df['b'] = 0
df.ix[0, 'b'] = 1
df

    a  b
0   1  1
1   1  0
2   5  0
3   2  0
4   7  0
5   8  0
6  16  0
7  16  0
8  16  0

Now, I want to generate the rest of the column 'b' by taking the minimum of the previous row and adding two. One solution would be

for i in range(1, len(df)):
    df.ix[i, 'b'] = df.ix[i-1, :].min() + 2

Resulting in the desired output

    a   b
0   1   1
1   1   3
2   5   3
3   2   5
4   7   4
5   8   6
6  16   8
7  16  10
8  16  12

Does pandas have a 'clean' way to do this? Preferably one that would vectorize the computation?

like image 833
michael_j_ward Avatar asked Oct 14 '16 16:10

michael_j_ward


1 Answers

pandas doesn't have a great way to handle general recursive calculations. There may be some trick to vectorize it, but if you can take the dependency, this is relatively painless and very fast with numba.

@numba.njit
def make_b(a):
    b = np.zeros_like(a)
    b[0] = 1
    for i in range(1, len(a)):
        b[i] = min(b[i-1], a[i-1]) + 2

    return b

df['b'] = make_b(df['a'].values)

df
Out[73]: 
    a   b
0   1   1
1   1   3
2   5   3
3   2   5
4   7   4
5   8   6
6  16   8
7  16  10
8  16  12
like image 182
chrisb Avatar answered Sep 28 '22 07:09

chrisb