Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Max and Min date in pandas groupby

I have a dataframe that looks like:

data = {'index': ['2014-06-22 10:46:00', '2014-06-24 19:52:00', '2014-06-25 17:02:00', '2014-06-25 17:55:00', '2014-07-02 11:36:00', '2014-07-06 12:40:00', '2014-07-05 12:46:00', '2014-07-27 15:12:00'],     'type': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'C'],     'sum_col': [1, 2, 3, 1, 1, 3, 2, 1]} df = pd.DataFrame(data, columns=['index', 'type', 'sum_col']) df['index'] = pd.to_datetime(df['index']) df = df.set_index('index') df['weekofyear'] = df.index.weekofyear df['date'] = df.index.date df['date'] = pd.to_datetime(df['date'])                         type sum_col weekofyear   date index                2014-06-22 10:46:00    A    1       25      2014-06-22 2014-06-24 19:52:00    B    2       26      2014-06-24 2014-06-25 17:02:00    C    3       26      2014-06-25 2014-06-25 17:55:00    A    1       26      2014-06-25 2014-07-02 11:36:00    B    1       27      2014-07-02 2014-07-06 12:40:00    C    3       27      2014-07-06 2014-07-05 12:46:00    A    2       27      2014-07-05 2014-07-27 15:12:00    C    1       30      2014-07-27 

I'm looking to groupby the weekofyear, then sum up the sum_col. In addition, I need to find the earliest, and the latest date for the week. The first part is pretty easy:

gb = df.groupby(['type', 'weekofyear']) gb['sum_col'].agg({'sum_col' : np.sum}) 

I've tried to find the min/max date with this, but haven't been successful:

gb = df.groupby(['type', 'weekofyear']) gb.agg({'sum_col' : np.sum,         'date' : np.min,         'date' : np.max}) 

How would one find the earliest/latest date that appears?

like image 719
DataSwede Avatar asked Jul 29 '14 20:07

DataSwede


People also ask

What does Groupby Max do?

max. Compute max of group values. Include only float, int, boolean columns.


2 Answers

You need to combine the functions that apply to the same column, like this:

In [116]: gb.agg({'sum_col' : np.sum,      ...:         'date' : [np.min, np.max]}) Out[116]:                        date             sum_col                       amin       amax      sum type weekofyear                                A    25         2014-06-22 2014-06-22        1      26         2014-06-25 2014-06-25        1      27         2014-07-05 2014-07-05        2 B    26         2014-06-24 2014-06-24        2      27         2014-07-02 2014-07-02        1 C    26         2014-06-25 2014-06-25        3      27         2014-07-06 2014-07-06        3      30         2014-07-27 2014-07-27        1 
like image 194
chrisb Avatar answered Sep 28 '22 08:09

chrisb


Simple code can be

df.groupby([key_field]).agg({'time_field': [np.min,np.max]}) 

where key_field here can be event_id and time_field can be timestamp field.

like image 32
Shantanu Deshmukh Avatar answered Sep 28 '22 08:09

Shantanu Deshmukh