Grouping DataFrame by start of decade using pandas Grouper

Tags:

I have a dataframe of daily observations from 01-01-1973 to 12-31-2014.

Have been using Pandas Grouper and everything has worked fine for each frequency until now: I want to group them by decade 70s, 80s, 90s, etc.

I tried to do it as

import pandas as pd
df.groupby(pd.Grouper(freq = '10Y')).mean()

However, this groups them in 73-83, 83-93, etc.

330

asked May 03 '18 02:05

4 Answers

pd.cut also works to specify a regular frequency with a specified start year.

import pandas as pd
df
                 date  val
0 1970-01-01 00:01:18    1
1 1979-12-31 18:01:01   12
2 1980-01-01 00:00:00    2
3 1989-01-01 00:00:00    3
4 2014-05-06 00:00:00    4

df.groupby(pd.cut(df.date, pd.date_range('1970', '2020', freq='10YS'), right=False)).mean()
#                          val
#date                         
#[1970-01-01, 1980-01-01)  6.5
#[1980-01-01, 1990-01-01)  2.5
#[1990-01-01, 2000-01-01)  NaN
#[2000-01-01, 2010-01-01)  NaN
#[2010-01-01, 2020-01-01)  4.0

140

answered Oct 21 '22 20:10

@cᴏʟᴅsᴘᴇᴇᴅ's method is cleaner then this, but keeping your pd.Grouper method, one way to do this is to merge your data with a new date range that starts at the beginning of a decade and ends at the end of a decade, then use your Grouper on that. For example, given an initial df:

        date      data
0     1973-01-01 -1.097895
1     1973-01-02  0.834253
2     1973-01-03  0.134698
3     1973-01-04 -1.211177
4     1973-01-05  0.366136
...
15335 2014-12-27 -0.566134
15336 2014-12-28 -1.100476
15337 2014-12-29  0.115735
15338 2014-12-30  1.635638
15339 2014-12-31  1.930645

Merge that with a date_range dataframe ranging from 1980 to 2020:

new_df = pd.DataFrame({'date':pd.date_range(start='01-01-1970', end='12-31-2019', freq='D')})

df = new_df.merge(df, on ='date', how='left')

And use your Grouper:

df.groupby(pd.Grouper(key='date', freq = '10AS')).mean()

Which gives you:

                data
date                
1970-01-01 -0.005455
1980-01-01  0.028066
1990-01-01  0.011122
2000-01-01  0.011213
2010-01-01  0.029592

The same, but in one go, could look like this:

(df.merge(pd.DataFrame(
    {'date':pd.date_range(start='01-01-1970',
                          end='12-31-2019',
                          freq='D')}),
          how='right')
 .groupby(pd.Grouper(key='date', freq = '10AS'))
 .mean())

answered Oct 21 '22 18:10

sacuL

Something like

df.groupby(df.index.astype(str).str[:2]+'0').mean()

answered Oct 21 '22 20:10

BENY

Related questions
                            
                                Changing variables in multiple Python instances
                            
                                Why does Python evaluate this expression incorrectly?
                            
                                numpy vs list comprehension, which is faster? [closed]
                            
                                How do I run Django as a service?
                            
                                What does << represent in python?
                            
                                "read more" in django posts
                            
                                In python, how can I use regex to replace square bracket with parentheses
                            
                                Django cannot import LOCAL settings
                            
                                Python Encoding - Could not decode to utf8
                            
                                Multiple pie charts using matplotlib
                            
                                AttributeError: 'Nonetype' object has no attribute '_info'
                            
                                A whole sheet into a panda dataframe with xlwings
                            
                                sentiwordnet scoring with python
                            
                                How to know if the left mouse click is pressed
                            
                                How does tensorflow.pad work?
                            
                                Natural Sort of list containing paths in Python
                            
                                running python script in interactive python prompt and keep the variables? [duplicate]
                            
                                error: Microsoft Visual C++ 14.0 is required when installing python package
                            
                                Building a Transition Matrix using words in Python/Numpy
                            
                                Why is this python code running twice [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grouping DataFrame by start of decade using pandas Grouper

Tags:

python

pandas

group-by

pandas-groupby

ForeignVolatility

People also ask

4 Answers

ALollz

cs95

sacuL

BENY

Recent Activity

Donate For Us