Give such a data frame df
:
id_ val
11111 12
12003 22
88763 19
43721 77
...
I wish to add a column diff
to df
, and each row of it equals to, let's say, the val
in that row minus the diff
in the previous row and multiply 0.4 and then add diff
in the previous day:
diff = (val - diff_previousDay) * 0.4 + diff_previousDay
And the diff
in the first row equals to val * 4
in that row. That is, the expected df
should be:
id_ val diff
11111 12 4.8
12003 22 11.68
88763 19 14.608
43721 77 ...
And I have tried:
mul = 0.4
df['diff'] = df.apply(lambda row: (row['val'] - df.loc[row.name, 'diff']) * mul + df.loc[row.name, 'diff'] if int(row.name) > 0 else row['val'] * mul, axis=1)
But got such as error:
TypeError: ("unsupported operand type(s) for -: 'float' and 'NoneType'", 'occurred at index 1')
Do you know how to solve this problem? Thank you in advance!
You can use:
df.loc[0, 'diff'] = df.loc[0, 'val'] * 0.4
for i in range(1, len(df)):
df.loc[i, 'diff'] = (df.loc[i, 'val'] - df.loc[i-1, 'diff']) * 0.4 + df.loc[i-1, 'diff']
print (df)
id_ val diff
0 11111 12 4.8000
1 12003 22 11.6800
2 88763 19 14.6080
3 43721 77 39.5648
The iterative nature of the calculation where the inputs depend on results of previous steps complicates vectorization. You could perhaps use apply with a function that does the same calculation as the loop, but behind the scenes this would also be a loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With