Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas idxmax() doesn't work on Series grouped by time periods which contain NaNs

Tags:

python

pandas

I have a Series that contains scalar values indexes by days over several years. For some years there are not data.

2014-10-07    5036.883410
2013-10-11    5007.515654
2013-10-27    5020.184053
2014-09-12    5082.379630
2014-10-14    5032.669801
2014-10-30    5033.276159
2016-10-03    5046.921912
2016-10-19    5141.861889
2017-10-06    5266.138810

From this I want to get 1. the maximum for each year 2. the day of the maximum for each year For those years where there are not data, there should be a nan.

To resolve 1. the following works:

import pandas as pd
import numpy as np

data= pd.Series( index=pd.DatetimeIndex(['2014-10-07', '2013-10-11', '2013-10-27', '2014-09-12', '2014-10-14', '2014-10-30', '2016-10-03', '2016-10-19', '2017-10-06'], dtype='datetime64[ns]', name='time', freq=None), data=np.array([5036.88341035, 5007.51565355, 5020.18405295, 5082.37963023, 5032.66980146, 5033.27615931, 5046.92191246, 5141.86188915, 5266.1388102 ]))

# get maximum of each year  
data.resample('A').max()

However, I tried different options to get the index of the date with the maximum, but they all failed:

data.resample('A').idxmax()

This raises the following Attribute error:

AttributeError: 'DatetimeIndexResampler' object has no attribute 'idxmax'

Then I tried the following:

data.groupby(pd.TimeGrouper('A')).idxmax()

but this gave an ValueError without specification. I then found this workaround:

data.groupby(pd.TimeGrouper('A')).agg( lambda x : x.idxmax() )

but I did not wore either for temporally grouped data:

ValueError: attempt to get argmax of an empty sequence

Apparently the reported bug has not been fixed yet and the suggested workaround for categorical data does not seem to work for temporally grouped/resampled data.

Can anyone provide a suitable workaround for this case or maybe an entirely different (and efficient) solution approach to the above problem?

Thanks in advance!

like image 407
Jan N Avatar asked Mar 06 '23 03:03

Jan N


1 Answers

The problem is that you have no records during 2015, but a time period for 2015 is created since it is inside your years' range. You need to manually process this case:

data.resample('A').agg(
    lambda x : np.nan if x.count() == 0 else x.idxmax()
)

Output:

time
2013-12-31   2013-10-27
2014-12-31   2014-09-12
2015-12-31          NaT
2016-12-31   2016-10-19
2017-12-31   2017-10-06
Freq: A-DEC, dtype: datetime64[ns]
like image 136
Sasha Tsukanov Avatar answered Apr 27 '23 16:04

Sasha Tsukanov