I have a pandas dataframe with two columns A
,B
as below.
I want a vectorized solution for creating a new column C where C[i] = C[i-1] - A[i] + B[i]
.
df = pd.DataFrame(data={'A': [10, 2, 3, 4, 5, 6], 'B': [0, 1, 2, 3, 4, 5]})
>>> df
A B
0 10 0
1 2 1
2 3 2
3 4 3
4 5 4
5 6 5
Here is the solution using for-loops:
df['C'] = df['A']
for i in range(1, len(df)):
df['C'][i] = df['C'][i-1] - df['A'][i] + df['B'][i]
>>> df
A B C
0 10 0 10
1 2 1 9
2 3 2 8
3 4 3 7
4 5 4 6
5 6 5 5
... which does the job.
But since loops are slow in comparison to vectorized calculations, I want a vectorized solution for this in pandas:
I tried to use the shift()
method like this:
df['C'] = df['C'].shift(1).fillna(df['A']) - df['A'] + df['B']
but it didn't help since the shifted C column isn't updated with the calculation. It keeps its original values:
>>> df['C'].shift(1).fillna(df['A'])
0 10
1 10
2 2
3 3
4 4
5 5
and that produces a wrong result.
This can be vectorized since:
delta[i] = C[i] - C[i-1] = -A[i] +B[i]
. You can get delta
from A
and B
first, then...delta
(plus C[0]
) to get full C
Code as follows:
delta = df['B'] - df['A']
delta[0] = 0
df['C'] = df.loc[0, 'A'] + delta.cumsum()
print df
A B C
0 10 0 10
1 2 1 9
2 3 2 8
3 4 3 7
4 5 4 6
5 6 5 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With