Pandas groupby month and year

Tags:

python

pandas

I have the following dataframe:

Date        abc    xyz 01-Jun-13   100    200 03-Jun-13   -20    50 15-Aug-13   40     -5 20-Jan-14   25     15 21-Feb-14   60     80

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

Thank you for any assistance.

776

asked Oct 30 '14 06:10

darkpool

2 Answers

You can use either resample or Grouper (which resamples under the hood).

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

In [11]: df1 Out[11]:             abc  xyz Date 2013-06-01  100  200 2013-06-03  -20   50 2013-08-15   40   -5 2014-01-20   25   15 2014-02-21   60   80  In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)  In [13]: g.sum() Out[13]:             abc  xyz Date 2013-06-30   80  250 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80  In [14]: df1.resample("M", how='sum')  # the same Out[14]:             abc  xyz Date 2013-06-30   40  125 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.). I'm including this for interest's sake.

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df Out[21]:         Date  abc  xyz 0 2013-06-01  100  200 1 2013-06-03  -20   50 2 2013-08-15   40   -5 3 2014-01-20   25   15 4 2014-02-21   60   80  In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way Out[22]: <class 'pandas.tseries.period.PeriodIndex'> [2013-06, ..., 2014-02] Length: 5, Freq: M  In [23]: per = df.Date.dt.to_period("M")  # new way to get the same  In [24]: g = df.groupby(per)  In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps) Out[25]:          abc  xyz 2013-06   80  250 2013-08   40   -5 2014-01   25   15 2014-02   60   80

To get the desired result we have to reindex...

156

answered Sep 20 '22 22:09

Andy Hayden

Why not keep it simple?!

GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()

giving you,

print(GB)         abc  xyz 2013 6   80  250      8   40   -5 2014 1   25   15      2   60   80

and then you can plot like asked using,

GB.plot('abc','xyz',kind='scatter')

answered Sep 21 '22 22:09

Q-man

Related questions
                            
                                Lazy logger message string evaluation
                            
                                Pythonic way to sorting list of namedtuples by field name
                            
                                Create a day-of-week column in a Pandas dataframe using Python
                            
                                How can I normalize a URL in python
                            
                                How to add line based on slope and intercept in Matplotlib?
                            
                                Making decorators with optional arguments [duplicate]
                            
                                Installing a pip package from within a Jupyter Notebook not working
                            
                                Create own colormap using matplotlib and plot color scale
                            
                                FailedPreconditionError: Attempting to use uninitialized in Tensorflow
                            
                                convert json ipython notebook(.ipynb) to .py file
                            
                                Listing of all files in directory?
                            
                                Django: using <select multiple> and POST
                            
                                Why aren't Python sets hashable?
                            
                                How to implement retry mechanism into python requests library?
                            
                                User-friendly time format in Python?
                            
                                Find the end of the month of a Pandas DataFrame Series
                            
                                How do I use pdfminer as a library
                            
                                efficiently checking that string consists of one character in Python
                            
                                How to add a calculated field to a Django model
                            
                                TypeError: 'int' object is not callable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With