I'm working with a time-series price data and I want to know how high price can reach in the next K-rows for every row.
I can think of achieving it somehow using .argmax()
while filtering dataframe based on time, however there must be a simpler built-in solution.
For example:
Price
1 $10
2 $11
3 $15
4 $18
5 $13
6 $4
7 $25
For K=2, here's what I want:
Price Highest_In_Next_2_Rows
1 $10 $15
2 $11 $18
3 $15 $18
4 $18 $13
5 $13 $25
6 $4 $25
7 $25 NaN
You can achieve this using pandas rolling and shift function.
Essentially you find the rolling max over the previous k observations and then you shift the series by k, so that the max for t is the one calculated over (t+1, ..., t+k).
import pandas as pd
import numpy as np
ts = pd.Series([10, 11, 15, 18, 13, 4, 25])
k = 2
res = ts.rolling(k).max().shift(-k)
pd.concat([ts, res], axis = 1)
output:
# 0 1
# 0 10 15.0
# 1 11 18.0
# 2 15 18.0
# 3 18 13.0
# 4 13 25.0
# 5 4 NaN
# 6 25 NaN
The problem of this solution is that it doesn't give results for the last k observations.
A workaround is the following: You consider the series in reverse order and calculate the rolling max over the past k observations (giving results when there is at least one observation). Then you lag by one day as you don't want today's price to be included and you reverse again to go back to the original order.
res = ts[::-1].rolling(k,1).max().shift(1)[::-1]
Which replicates exactly the desired output:
# 0 1
#0 10 15.0
#1 11 18.0
#2 15 18.0
#3 18 13.0
#4 13 25.0
#5 4 25.0
#6 25 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With