Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find highest value among the next K-consecutive rows in Pandas?

Tags:

python

pandas

I'm working with a time-series price data and I want to know how high price can reach in the next K-rows for every row.

I can think of achieving it somehow using .argmax() while filtering dataframe based on time, however there must be a simpler built-in solution.

For example:

  Price
 1 $10
 2 $11
 3 $15
 4 $18
 5 $13
 6 $4
 7 $25

For K=2, here's what I want:

  Price   Highest_In_Next_2_Rows
1 $10     $15
2 $11     $18
3 $15     $18
4 $18     $13
5 $13     $25
6 $4      $25
7 $25     NaN
like image 302
d9ngle Avatar asked Dec 23 '22 18:12

d9ngle


1 Answers

You can achieve this using pandas rolling and shift function.

Essentially you find the rolling max over the previous k observations and then you shift the series by k, so that the max for t is the one calculated over (t+1, ..., t+k).

import pandas as pd
import numpy as np

ts = pd.Series([10, 11, 15, 18, 13, 4, 25])
k = 2

res = ts.rolling(k).max().shift(-k)

pd.concat([ts, res], axis = 1) 

output:

#     0     1
# 0  10  15.0
# 1  11  18.0
# 2  15  18.0
# 3  18  13.0
# 4  13  25.0
# 5   4   NaN
# 6  25   NaN

The problem of this solution is that it doesn't give results for the last k observations.

A workaround is the following: You consider the series in reverse order and calculate the rolling max over the past k observations (giving results when there is at least one observation). Then you lag by one day as you don't want today's price to be included and you reverse again to go back to the original order.

res = ts[::-1].rolling(k,1).max().shift(1)[::-1]

Which replicates exactly the desired output:

#    0     1
#0  10  15.0
#1  11  18.0
#2  15  18.0
#3  18  13.0
#4  13  25.0
#5   4  25.0
#6  25   NaN
like image 157
FLab Avatar answered Mar 09 '23 00:03

FLab