Pandas: resample timeseries with groupby

Tags:

Given the below pandas DataFrame:

In [115]: times = pd.to_datetime(pd.Series(['2014-08-25 21:00:00','2014-08-25 21:04:00',                                             '2014-08-25 22:07:00','2014-08-25 22:09:00']))           locations = ['HK', 'LDN', 'LDN', 'LDN']           event = ['foo', 'bar', 'baz', 'qux']           df = pd.DataFrame({'Location': locations,                              'Event': event}, index=times)           df Out[115]:                                Event Location           2014-08-25 21:00:00  foo   HK           2014-08-25 21:04:00  bar   LDN           2014-08-25 22:07:00  baz   LDN           2014-08-25 22:09:00  qux   LDN

I would like resample the data to aggregate it hourly by count while grouping by location to produce a data frame that looks like this:

Out[115]:                                HK    LDN           2014-08-25 21:00:00  1     1           2014-08-25 22:00:00  0     2

I've tried various combinations of resample() and groupby() but with no luck. How would I go about this?

667

asked Aug 14 '15 14:08

AshB

2 Answers

In my original post, I suggested using pd.TimeGrouper. Nowadays, use pd.Grouper instead of pd.TimeGrouper. The syntax is largely the same, but TimeGrouper is now deprecated in favor of pd.Grouper.

Moreover, while pd.TimeGrouper could only group by DatetimeIndex, pd.Grouper can group by datetime columns which you can specify through the key parameter.

You could use a pd.Grouper to group the DatetimeIndex'ed DataFrame by hour:

grouper = df.groupby([pd.Grouper(freq='1H'), 'Location'])

use count to count the number of events in each group:

grouper['Event'].count() #                      Location # 2014-08-25 21:00:00  HK          1 #                      LDN         1 # 2014-08-25 22:00:00  LDN         2 # Name: Event, dtype: int64

use unstack to move the Location index level to a column level:

grouper['Event'].count().unstack() # Out[49]:  # Location             HK  LDN # 2014-08-25 21:00:00   1    1 # 2014-08-25 22:00:00 NaN    2

and then use fillna to change the NaNs into zeros.

Putting it all together,

grouper = df.groupby([pd.Grouper(freq='1H'), 'Location']) result = grouper['Event'].count().unstack('Location').fillna(0)

yields

Location             HK  LDN 2014-08-25 21:00:00   1    1 2014-08-25 22:00:00   0    2

answered Sep 19 '22 13:09

unutbu

Pandas 0.21 answer: TimeGrouper is getting deprecated

There are two options for doing this. They actually can give different results based on your data. The first option groups by Location and within Location groups by hour. The second option groups by Location and hour at the same time.

Option 1: Use groupby + resample

grouped = df.groupby('Location').resample('H')['Event'].count()

Option 2: Group both the location and DatetimeIndex together with groupby(pd.Grouper)

grouped = df.groupby(['Location', pd.Grouper(freq='H')])['Event'].count()

They both will result in the following:

Location                      HK        2014-08-25 21:00:00    1 LDN       2014-08-25 21:00:00    1           2014-08-25 22:00:00    2 Name: Event, dtype: int64

And then reshape:

grouped.unstack('Location', fill_value=0)

Will output

Location             HK  LDN 2014-08-25 21:00:00   1    1 2014-08-25 22:00:00   0    2

answered Sep 19 '22 13:09

Ted Petrou

Related questions
                            
                                making matplotlib graphs look like R by default?
                            
                                AttributeError while querying: Neither 'InstrumentedAttribute' object nor 'Comparator' has an attribute
                            
                                Python range() and zip() object type
                            
                                How would I compute exactly 30 days into the past with Python (down to the minute)?
                            
                                How can I send an xml body using requests library?
                            
                                log4j with timestamp per log entry
                            
                                Make function definition in a python file order independent
                            
                                How do I create a new database in MongoDB using PyMongo?
                            
                                Iterate over all combinations of values in multiple lists in Python
                            
                                String literal with triple quotes in function definitions
                            
                                Remove all newlines from inside a string
                            
                                Problems with using a rough greyscale algorithm?
                            
                                How to create a large pandas dataframe from an sql query without running out of memory?
                            
                                Activate python virtualenv in Dockerfile
                            
                                How to copy all properties of an object to another object, in Python?
                            
                                How to set Selenium Python WebDriver default timeout?
                            
                                Retrieve top n in each group of a DataFrame in pyspark
                            
                                Using a global dictionary with threads in Python
                            
                                Tell if Python is in interactive mode
                            
                                InterfaceError (0, '')

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: resample timeseries with groupby

Tags:

python

pandas

group-by

time-series