Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply set_index over groupby object in order to apply asfreq per group

Tags:

pandas

Im looking to apply pading over each group of my data frame

notice that for a single group ('element_id') i have no problem in pading:

first group (group1):

{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}

So im applying padding over it (which works great):

print group1.set_index('date').asfreq('D', method='pad').head()

Im looking to apply this logic over several groups through groupby

Another group (group2):

{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}

group_data=pd.concat([group1,group2],axis=0)
group_data.groupby(['element_id']).set_index('date').resample('D').asfreq()

And im getting the following error:

AttributeError: Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using the 'apply' method
like image 528
Yehoshaphat Schellekens Avatar asked Nov 23 '17 11:11

Yehoshaphat Schellekens


People also ask

How do I iterate over a group in pandas?

groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.

How do I turn a Groupby object into a list?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

Does pandas Groupby preserve order?

Groupby preserves the order of rows within each group.

Can you Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.


1 Answers

First there is problem your date column has dtype object, not datetime, so first is necessary convert it by to_datetime.

Then is possible use GroupBy.apply:

group_data['date'] = pd.to_datetime(group_data['date'])

df = (group_data.groupby(['element_id'])
                .apply(lambda x: x.set_index('date').resample('D').ffill()))

print (df.head())

                      VALUE  element_id
element_id date                        
122        2017-09-26   2.0         122
           2017-09-27   2.0         122
           2017-09-28   2.0         122
           2017-09-29   2.0         122
           2017-09-30   2.0         122

Or DataFrameGroupBy.resample:

    df = group_data.set_index('date').groupby(['element_id']).resample('D').ffill()
print (df.head())
                      VALUE  element_id
element_id date                        
122        2017-09-26   2.0         122
           2017-09-27   2.0         122
           2017-09-28   2.0         122
           2017-09-29   2.0         122
           2017-09-30   2.0         122

EDIT:

If problem with duplicates values solution is add new column for subgroups with unique dates. If use concat there is parameter keys for it:

group1 = pd.DataFrame({'date': {88: datetime.date(2017, 10, 3), 
                                43: datetime.date(2017, 9, 26), 
                                159: datetime.date(2017, 11, 8)}, 
                       u'element_id': {88: 122, 43: 122, 159: 122}, 
                       u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}})


d = {'level_0':'g'}
group_data=pd.concat([group1,group1], keys=('a','b')).reset_index(level=0).rename(columns=d)
print (group_data)
     g VALUE        date  element_id
43   a   2.0  2017-09-26         122
88   a   8.0  2017-10-03         122
159  a   5.0  2017-11-08         122
43   b   2.0  2017-09-26         122
88   b   8.0  2017-10-03         122
159  b   5.0  2017-11-08         122


group_data['date'] = pd.to_datetime(group_data['date'])

df = (group_data.groupby(['g','element_id'])
                .apply(lambda x: x.set_index('date').resample('D').ffill()))

print (df.head())

                         g VALUE  element_id
g element_id date                           
a 122        2017-09-26  a   2.0         122
             2017-09-27  a   2.0         122
             2017-09-28  a   2.0         122
             2017-09-29  a   2.0         122
             2017-09-30  a   2.0         122
like image 54
jezrael Avatar answered Sep 24 '22 19:09

jezrael