Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping DataFrame by start of decade using pandas Grouper

I have a dataframe of daily observations from 01-01-1973 to 12-31-2014.

Have been using Pandas Grouper and everything has worked fine for each frequency until now: I want to group them by decade 70s, 80s, 90s, etc.

I tried to do it as

import pandas as pd
df.groupby(pd.Grouper(freq = '10Y')).mean()

However, this groups them in 73-83, 83-93, etc.

like image 330
ForeignVolatility Avatar asked May 03 '18 02:05

ForeignVolatility


People also ask

How do you get the first group on Groupby pandas?

The pandas. groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.

What does grouper do in pandas?

Grouper. A Grouper allows the user to specify a groupby instruction for an object. This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

How do you make age groups in pandas?

pandas Grouping Data Grouping numbers a sequence of integers denoting the endpoint of the left-open intervals in which the data is divided into—for instance bins=[19, 40, 65, np. inf] creates three age groups (19, 40] , (40, 65] , and (65, np. inf] .


4 Answers

pd.cut also works to specify a regular frequency with a specified start year.

import pandas as pd
df
                 date  val
0 1970-01-01 00:01:18    1
1 1979-12-31 18:01:01   12
2 1980-01-01 00:00:00    2
3 1989-01-01 00:00:00    3
4 2014-05-06 00:00:00    4

df.groupby(pd.cut(df.date, pd.date_range('1970', '2020', freq='10YS'), right=False)).mean()
#                          val
#date                         
#[1970-01-01, 1980-01-01)  6.5
#[1980-01-01, 1990-01-01)  2.5
#[1990-01-01, 2000-01-01)  NaN
#[2000-01-01, 2010-01-01)  NaN
#[2010-01-01, 2020-01-01)  4.0
like image 140
ALollz Avatar answered Oct 21 '22 20:10

ALollz


You can do a little arithmetic on the year to floor it to the nearest decade:

df.groupby(df.index.year // 10 * 10).mean()
like image 40
cs95 Avatar answered Oct 21 '22 19:10

cs95


@cᴏʟᴅsᴘᴇᴇᴅ's method is cleaner then this, but keeping your pd.Grouper method, one way to do this is to merge your data with a new date range that starts at the beginning of a decade and ends at the end of a decade, then use your Grouper on that. For example, given an initial df:

        date      data
0     1973-01-01 -1.097895
1     1973-01-02  0.834253
2     1973-01-03  0.134698
3     1973-01-04 -1.211177
4     1973-01-05  0.366136
...
15335 2014-12-27 -0.566134
15336 2014-12-28 -1.100476
15337 2014-12-29  0.115735
15338 2014-12-30  1.635638
15339 2014-12-31  1.930645

Merge that with a date_range dataframe ranging from 1980 to 2020:

new_df = pd.DataFrame({'date':pd.date_range(start='01-01-1970', end='12-31-2019', freq='D')})

df = new_df.merge(df, on ='date', how='left')

And use your Grouper:

df.groupby(pd.Grouper(key='date', freq = '10AS')).mean()

Which gives you:

                data
date                
1970-01-01 -0.005455
1980-01-01  0.028066
1990-01-01  0.011122
2000-01-01  0.011213
2010-01-01  0.029592

The same, but in one go, could look like this:

(df.merge(pd.DataFrame(
    {'date':pd.date_range(start='01-01-1970',
                          end='12-31-2019',
                          freq='D')}),
          how='right')
 .groupby(pd.Grouper(key='date', freq = '10AS'))
 .mean())
like image 24
sacuL Avatar answered Oct 21 '22 18:10

sacuL


Something like

df.groupby(df.index.astype(str).str[:2]+'0').mean()
like image 34
BENY Avatar answered Oct 21 '22 20:10

BENY