Pandas: Group by bi-monthly date field

Question

I am trying to group by hospital staff working hours bi monthly. I have raw data on daily basis which look like below.

date       hourse_spent emp_id  
9/11/2016     8          1  
15/11/2016    8          1  
22/11/2016    8          2  
23/11/2016    8          1

How I want to group by is.

cycle                 hourse_spent      emp_id   
1/11/2016-15/11/2016      16                 1
16/11/2016-31/11/2016      8                 2
16/11/2016-31/11/2016      8                 1

I am trying to do the same with grouper and frequency in pandas something as below.

data.set_index('date',inplace=True)
print data.head()
dt = data.groupby(['emp_id', pd.Grouper(key='date', freq='MS')])['hours_spent'].sum().reset_index().sort_values('date')

#df.resample('10d').mean().interpolate(method='linear',axis=0)
print dt.resample('SMS').sum()

I also tried resampling

df1 = dt.resample('MS', loffset=pd.Timedelta(15, 'd')).sum()
data.set_index('date',inplace=True)
df1 = data.resample('MS', loffset=pd.Timedelta(15, 'd')).sum()

But this is giving data of 15 days interval not like 1 to 15 and 15 to 31.

Please let me know what I am doing wrong here.

Vivek Kalyanarangan · Accepted Answer

You were almost there. This will do it -

dt = df.groupby(['emp_id', pd.Grouper(key='date', freq='SM')])['hours_spent'].sum().reset_index().sort_values('date')

emp_id  date    hours_spent
1   2016-10-31  8
1   2016-11-15  16
2   2016-11-15  8

The freq='SM' is the concept of semi-months which will use the 15th and the last day of every month

vnthn · Answer

Put DateTime-Values into Bins

If I got you right, you basically want to put your values in the date column into bins. For this, pandas has the pd.cut() function included, which does exactly what you want.

Here's an approach which might help you:

import pandas as pd
df = pd.DataFrame({
  'hours'  : 8,
  'emp_id' : [1,1,2,1],
  'date'   : [pd.datetime(2016,11,9),
              pd.datetime(2016,11,15),
              pd.datetime(2016,11,22),
              pd.datetime(2016,11,23)]
     })
bins_dt = pd.date_range('2016-10-16', freq='SM', periods=3)
cycle = pd.cut(df.date, bins_dt)
df.groupby([cycle, 'emp_id']).sum()

Which gets you:

cycle                    emp_id hours 
------------------------ ------ ------
(2016-10-31, 2016-11-15] 1      16    
                         2      NaN   
(2016-11-15, 2016-11-30] 1      8     
                         2      8

Pandas: Group by bi-monthly date field

Tags:

python

pandas

Django Man

2 Answers

Vivek Kalyanarangan

vnthn

Recent Activity

Donate For Us

Pandas: Group by bi-monthly date field

Tags:

python

pandas

Django Man

2 Answers

Vivek Kalyanarangan

vnthn

Related questions

Recent Activity

Donate For Us