I have the following dataframe:
date value
2014-01-20 10
2014-01-21 12
2014-01-22 13
2014-01-23 9
2014-01-24 7
2014-01-25 12
2014-01-26 11
I need to be able to keep track of when the latest maximum and minimum value occurred within a specific rolling window. For example if I were to use a rolling window period of 5, then I would need an output like the following:
date value rolling_max_date rolling_min_date
2014-01-20 10 2014-01-20 2014-01-20
2014-01-21 12 2014-01-21 2014-01-20
2014-01-22 13 2014-01-22 2014-01-20
2014-01-23 9 2014-01-22 2014-01-23
2014-01-24 7 2014-01-22 2014-01-24
2014-01-25 12 2014-01-22 2014-01-24
2014-01-26 11 2014-01-25 2014-01-24
All this shows is, what is the date of the latest maximum and minimum value within the rolling window. I know pandas has rolling_min and rolling_max, but im not sure how to keep track of the index/date of when the most recent max/min occured within the window.
The min is simply the lowest observation, while the max is the highest observation.
The largest value in a data set is often called the maximum (or max for short), and the smallest value is called the minimum (or min). The difference between the maximum and minimum value is sometimes called the range and is calculated by subtracting the smallest value from the largest value.
Calling min() and max() With a Single Iterable Argument In these examples, you call min() and max() with a list of integer numbers and then with an empty list. The first call to min() returns the smallest number in the input list, -5 . In contrast, the first call to max() returns the largest number in the list, or 9 .
In R, we can find the minimum or maximum value of a vector or data frame. We use the min() and max() function to find minimum and maximum value respectively. The min() function returns the minimum value of a vector or data frame. The max() function returns the maximum value of a vector or data frame.
There is a more general rolling_apply
where you can provide your own function. However, the custom functions receives the windows as arrays, not dataframes, so the index information is not available (so you cannot use idxmin/max
).
But lets try to achieve this in two steps:
In [41]: df = df.set_index('date')
In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
Out[42]:
value
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 3
2014-01-26 2
This gives you the index in the window where the minimum is found. But, this index is for that particular window and not for the entire dataframe. So let's add the start of the window, and then use this integer location to retrieve the correct index locations index:
In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
In [47]: ilocs
Out[47]:
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 4
2014-01-26 4
Name: value, dtype: float64
In [48]: df.index.take(ilocs)
Out[48]:
Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
u'2014-01-24', u'2014-01-24', u'2014-01-24'],
dtype='object', name=u'date')
In [49]: df['rolling_min_date'] = df.index.take(ilocs)
In [50]: df
Out[50]:
value rolling_min_date
date
2014-01-20 10 2014-01-20
2014-01-21 12 2014-01-20
2014-01-22 13 2014-01-20
2014-01-23 9 2014-01-23
2014-01-24 7 2014-01-24
2014-01-25 12 2014-01-24
2014-01-26 11 2014-01-24
The same can be done for the maximum:
ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
df['rolling_max_date'] = df.index.take(ilocs)
Here is a workaround.
import pandas as pd
import numpy as np
# sample data
# ===============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
df
value
2015-01-01 13
2015-01-02 16
2015-01-03 22
2015-01-04 1
2015-01-05 4
2015-01-06 28
2015-01-07 4
2015-01-08 8
2015-01-09 10
2015-01-10 20
2015-01-11 22
2015-01-12 19
2015-01-13 5
2015-01-14 24
2015-01-15 7
2015-01-16 25
2015-01-17 25
2015-01-18 13
2015-01-19 27
2015-01-20 2
# processing
# ==========================================
# your cumstom function to track on max/min value/date
def track_minmax(df):
return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})
window = 5
# use list comprehension to do the for loop
pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)
rolling_max_date rolling_max_val rolling_min_date rolling_min_val
2015-01-01 NaT NaN NaT NaN
2015-01-02 NaT NaN NaT NaN
2015-01-03 NaT NaN NaT NaN
2015-01-04 NaT NaN NaT NaN
2015-01-05 2015-01-03 22 2015-01-04 1
2015-01-06 2015-01-06 28 2015-01-04 1
2015-01-07 2015-01-06 28 2015-01-04 1
2015-01-08 2015-01-06 28 2015-01-04 1
2015-01-09 2015-01-06 28 2015-01-05 4
2015-01-10 2015-01-06 28 2015-01-07 4
2015-01-11 2015-01-11 22 2015-01-07 4
2015-01-12 2015-01-11 22 2015-01-08 8
2015-01-13 2015-01-11 22 2015-01-13 5
2015-01-14 2015-01-14 24 2015-01-13 5
2015-01-15 2015-01-14 24 2015-01-13 5
2015-01-16 2015-01-16 25 2015-01-13 5
2015-01-17 2015-01-16 25 2015-01-13 5
2015-01-18 2015-01-16 25 2015-01-15 7
2015-01-19 2015-01-19 27 2015-01-15 7
2015-01-20 2015-01-19 27 2015-01-20 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With