Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Group by bi-monthly date field

Tags:

python

pandas

I am trying to group by hospital staff working hours bi monthly. I have raw data on daily basis which look like below.

date       hourse_spent emp_id  
9/11/2016     8          1  
15/11/2016    8          1  
22/11/2016    8          2  
23/11/2016    8          1

How I want to group by is.

cycle                 hourse_spent      emp_id   
1/11/2016-15/11/2016      16                 1
16/11/2016-31/11/2016      8                 2
16/11/2016-31/11/2016      8                 1

I am trying to do the same with grouper and frequency in pandas something as below.

data.set_index('date',inplace=True)
print data.head()
dt = data.groupby(['emp_id', pd.Grouper(key='date', freq='MS')])['hours_spent'].sum().reset_index().sort_values('date')

#df.resample('10d').mean().interpolate(method='linear',axis=0)
print dt.resample('SMS').sum()

I also tried resampling

df1 = dt.resample('MS', loffset=pd.Timedelta(15, 'd')).sum()
data.set_index('date',inplace=True)
df1 = data.resample('MS', loffset=pd.Timedelta(15, 'd')).sum()

But this is giving data of 15 days interval not like 1 to 15 and 15 to 31.

Please let me know what I am doing wrong here.

like image 537
Django Man Avatar asked Nov 01 '18 07:11

Django Man


2 Answers

You were almost there. This will do it -

dt = df.groupby(['emp_id', pd.Grouper(key='date', freq='SM')])['hours_spent'].sum().reset_index().sort_values('date')

emp_id  date    hours_spent
1   2016-10-31  8
1   2016-11-15  16
2   2016-11-15  8

The freq='SM' is the concept of semi-months which will use the 15th and the last day of every month

like image 84
Vivek Kalyanarangan Avatar answered Oct 28 '22 14:10

Vivek Kalyanarangan


Put DateTime-Values into Bins

If I got you right, you basically want to put your values in the date column into bins. For this, pandas has the pd.cut() function included, which does exactly what you want.

Here's an approach which might help you:

import pandas as pd
df = pd.DataFrame({
  'hours'  : 8,
  'emp_id' : [1,1,2,1],
  'date'   : [pd.datetime(2016,11,9),
              pd.datetime(2016,11,15),
              pd.datetime(2016,11,22),
              pd.datetime(2016,11,23)]
     })
bins_dt = pd.date_range('2016-10-16', freq='SM', periods=3)
cycle = pd.cut(df.date, bins_dt)
df.groupby([cycle, 'emp_id']).sum()

Which gets you:

cycle                    emp_id hours 
------------------------ ------ ------
(2016-10-31, 2016-11-15] 1      16    
                         2      NaN   
(2016-11-15, 2016-11-30] 1      8     
                         2      8      
like image 45
vnthn Avatar answered Oct 28 '22 13:10

vnthn