Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby month and year

Tags:

python

pandas

I have the following dataframe:

Date        abc    xyz 01-Jun-13   100    200 03-Jun-13   -20    50 15-Aug-13   40     -5 20-Jan-14   25     15 21-Feb-14   60     80 

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

Thank you for any assistance.

like image 776
darkpool Avatar asked Oct 30 '14 06:10

darkpool


People also ask

How do you do Groupby in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.


2 Answers

You can use either resample or Grouper (which resamples under the hood).

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

In [11]: df1 Out[11]:             abc  xyz Date 2013-06-01  100  200 2013-06-03  -20   50 2013-08-15   40   -5 2014-01-20   25   15 2014-02-21   60   80  In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)  In [13]: g.sum() Out[13]:             abc  xyz Date 2013-06-30   80  250 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80  In [14]: df1.resample("M", how='sum')  # the same Out[14]:             abc  xyz Date 2013-06-30   40  125 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80 

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.


I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.). I'm including this for interest's sake.

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df Out[21]:         Date  abc  xyz 0 2013-06-01  100  200 1 2013-06-03  -20   50 2 2013-08-15   40   -5 3 2014-01-20   25   15 4 2014-02-21   60   80  In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way Out[22]: <class 'pandas.tseries.period.PeriodIndex'> [2013-06, ..., 2014-02] Length: 5, Freq: M  In [23]: per = df.Date.dt.to_period("M")  # new way to get the same  In [24]: g = df.groupby(per)  In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps) Out[25]:          abc  xyz 2013-06   80  250 2013-08   40   -5 2014-01   25   15 2014-02   60   80 

To get the desired result we have to reindex...

like image 156
Andy Hayden Avatar answered Sep 20 '22 22:09

Andy Hayden


Why not keep it simple?!

GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum() 

giving you,

print(GB)         abc  xyz 2013 6   80  250      8   40   -5 2014 1   25   15      2   60   80 

and then you can plot like asked using,

GB.plot('abc','xyz',kind='scatter') 
like image 30
Q-man Avatar answered Sep 21 '22 22:09

Q-man