I have a dataframe of daily observations from 01-01-1973 to 12-31-2014.
Have been using Pandas Grouper and everything has worked fine for each frequency until now: I want to group them by decade 70s, 80s, 90s, etc.
I tried to do it as
import pandas as pd
df.groupby(pd.Grouper(freq = '10Y')).mean()
However, this groups them in 73-83, 83-93, etc.
The pandas. groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Grouper. A Grouper allows the user to specify a groupby instruction for an object. This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.
pandas Grouping Data Grouping numbers a sequence of integers denoting the endpoint of the left-open intervals in which the data is divided into—for instance bins=[19, 40, 65, np. inf] creates three age groups (19, 40] , (40, 65] , and (65, np. inf] .
pd.cut
also works to specify a regular frequency with a specified start year.
import pandas as pd
df
date val
0 1970-01-01 00:01:18 1
1 1979-12-31 18:01:01 12
2 1980-01-01 00:00:00 2
3 1989-01-01 00:00:00 3
4 2014-05-06 00:00:00 4
df.groupby(pd.cut(df.date, pd.date_range('1970', '2020', freq='10YS'), right=False)).mean()
# val
#date
#[1970-01-01, 1980-01-01) 6.5
#[1980-01-01, 1990-01-01) 2.5
#[1990-01-01, 2000-01-01) NaN
#[2000-01-01, 2010-01-01) NaN
#[2010-01-01, 2020-01-01) 4.0
You can do a little arithmetic on the year to floor it to the nearest decade:
df.groupby(df.index.year // 10 * 10).mean()
@cᴏʟᴅsᴘᴇᴇᴅ's method is cleaner then this, but keeping your pd.Grouper
method, one way to do this is to merge your data with a new date range that starts at the beginning of a decade and ends at the end of a decade, then use your Grouper
on that. For example, given an initial df
:
date data
0 1973-01-01 -1.097895
1 1973-01-02 0.834253
2 1973-01-03 0.134698
3 1973-01-04 -1.211177
4 1973-01-05 0.366136
...
15335 2014-12-27 -0.566134
15336 2014-12-28 -1.100476
15337 2014-12-29 0.115735
15338 2014-12-30 1.635638
15339 2014-12-31 1.930645
Merge that with a date_range
dataframe ranging from 1980 to 2020:
new_df = pd.DataFrame({'date':pd.date_range(start='01-01-1970', end='12-31-2019', freq='D')})
df = new_df.merge(df, on ='date', how='left')
And use your Grouper
:
df.groupby(pd.Grouper(key='date', freq = '10AS')).mean()
Which gives you:
data
date
1970-01-01 -0.005455
1980-01-01 0.028066
1990-01-01 0.011122
2000-01-01 0.011213
2010-01-01 0.029592
The same, but in one go, could look like this:
(df.merge(pd.DataFrame(
{'date':pd.date_range(start='01-01-1970',
end='12-31-2019',
freq='D')}),
how='right')
.groupby(pd.Grouper(key='date', freq = '10AS'))
.mean())
Something like
df.groupby(df.index.astype(str).str[:2]+'0').mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With