Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby date

I have a DataFrame with events. One or more events can occur at a date (so the date can't be an index). The date range is several years. I want to groupby years and months and have a count of the Category values. Thnx

in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1')
In [13]: df
Out[13]:
    EventRefNr     DateOccurence      Type Category
0        86596    2010-01-02 00:00:00     3    Small
1        86779    2010-01-09 00:00:00    13   Medium
2        86780    2010-02-10 00:00:00     6    Small
3        86781    2010-02-09 00:00:00    17    Small
4        86898    2010-02-10 00:00:00     6    Small
5        86898    2010-02-11 00:00:00     6    Small
6        86902    2010-02-17 00:00:00     9    Small
7        86908    2010-02-19 00:00:00     3   Medium
8        86908    2010-03-05 00:00:00     3   Medium
9        86909    2010-03-06 00:00:00     8    Small
10       86930    2010-03-12 00:00:00    29    Small
11       86934    2010-03-16 00:00:00     9    Small
12       86940    2010-04-08 00:00:00     9     High
13       86941    2010-04-09 00:00:00    17    Small
14       86946    2010-04-14 00:00:00    10    Small
15       86950    2011-01-19 00:00:00    12    Small
16       86956    2011-01-24 00:00:00    13    Small
17       86959    2011-01-27 00:00:00    17    Small

I tried:

df.groupby(df['DateOccurence'])
like image 923
ArtDijk Avatar asked Oct 17 '13 20:10

ArtDijk


2 Answers

For the month and year break out I often add additional columns to the data frame that break out the dates into each piece:

df['year'] = [t.year for t in df.DateOccurence]
df['month'] = [t.month for t in df.DateOccurence]
df['day'] = [t.day for t in df.DateOccurence]

It adds space complexity (adding columns to the df) but is less time complex (less processing on groupby) than a datetime index but it's really up to you. datetime index is the more pandas way to do things.

After breaking out by year, month, day you can do any groupby you need.

df.groupby['year','month'].Category.apply(pd.value_counts)

To get months across multiple years:

df.groupby['month'].Category.apply(pd.value_counts)

Or in Andy Hayden's datetime index

df.groupby[di.month].Category.apply(pd.value_counts)

You can simply pick which method fits your needs better.

like image 128
cwharland Avatar answered Nov 01 '22 21:11

cwharland


You can apply value_counts to the SeriesGroupby (for the column):

In [11]: g = df.groupby('DateOccurence')

In [12]: g.Category.apply(pd.value_counts)
Out[12]: 
DateOccurence        
2010-01-02     Small     1
2010-01-09     Medium    1
2010-02-09     Small     1
2010-02-10     Small     2
2010-02-11     Small     1
2010-02-17     Small     1
2010-02-19     Medium    1
2010-03-05     Medium    1
2010-03-06     Small     1
2010-03-12     Small     1
2010-03-16     Small     1
2010-04-08     High      1
2010-04-09     Small     1
2010-04-14     Small     1
2011-01-19     Small     1
2011-01-24     Small     1
2011-01-27     Small     1
dtype: int64

I actually hoped this to return the following DataFrame, but you need to unstack it:

In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0)
Out[13]: 
               High  Medium  Small
DateOccurence                     
2010-01-02        0       0      1
2010-01-09        0       1      0
2010-02-09        0       0      1
2010-02-10        0       0      2
2010-02-11        0       0      1
2010-02-17        0       0      1
2010-02-19        0       1      0
2010-03-05        0       1      0
2010-03-06        0       0      1
2010-03-12        0       0      1
2010-03-16        0       0      1
2010-04-08        1       0      0
2010-04-09        0       0      1
2010-04-14        0       0      1
2011-01-19        0       0      1
2011-01-24        0       0      1
2011-01-27        0       0      1

If there were multiple different Categories with the same Date they would be on the same row...

like image 24
Andy Hayden Avatar answered Nov 01 '22 21:11

Andy Hayden