I'm using pandas in Python and I have an issue to select some data. I have DataFrame with float values, and I would like to create a column which contains the maximum (or minimum) of the n previous rows of a column, and set to 0 for the n first rows, here's an example of the result I would like to have:
df_test = pd.DataFrame({'a':[2,7,2,0,-1, 19, -52, 2]})
df_test['result_i_want_with_n=3'] = [0, 0, 0, 7, 7, 2, 19, 19]
print(df_test)
a result_i_want_with_n=3
0 2 0
1 7 0
2 2 0
3 0 7
4 -1 7
5 19 2
6 -52 19
7 2 19
I managed to get this result using a while, but I would like to program it in a more "pandas" way to gain computation speed.
Thanks
Rolling is your friend here. You need to shift by one row in order to get your exact result, otherwise your first value will be in the third row.
df_test.rolling(window=3).max().shift(1).fillna(0)
0 0.0
1 0.0
2 0.0
3 7.0
4 7.0
5 2.0
6 19.0
7 19.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With