Apply set_index over groupby object in order to apply asfreq per group

Tags:

pandas

Im looking to apply pading over each group of my data frame

notice that for a single group ('element_id') i have no problem in pading:

first group (group1):

{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}

So im applying padding over it (which works great):

print group1.set_index('date').asfreq('D', method='pad').head()

Im looking to apply this logic over several groups through groupby

Another group (group2):

{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}

group_data=pd.concat([group1,group2],axis=0)
group_data.groupby(['element_id']).set_index('date').resample('D').asfreq()

And im getting the following error:

AttributeError: Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

528

asked Nov 23 '17 11:11

Yehoshaphat Schellekens

1 Answers

First there is problem your date column has dtype object, not datetime, so first is necessary convert it by to_datetime.

Then is possible use GroupBy.apply:

group_data['date'] = pd.to_datetime(group_data['date'])

df = (group_data.groupby(['element_id'])
                .apply(lambda x: x.set_index('date').resample('D').ffill()))

print (df.head())

                      VALUE  element_id
element_id date                        
122        2017-09-26   2.0         122
           2017-09-27   2.0         122
           2017-09-28   2.0         122
           2017-09-29   2.0         122
           2017-09-30   2.0         122

Or DataFrameGroupBy.resample:

    df = group_data.set_index('date').groupby(['element_id']).resample('D').ffill()
print (df.head())
                      VALUE  element_id
element_id date                        
122        2017-09-26   2.0         122
           2017-09-27   2.0         122
           2017-09-28   2.0         122
           2017-09-29   2.0         122
           2017-09-30   2.0         122

EDIT:

If problem with duplicates values solution is add new column for subgroups with unique dates. If use concat there is parameter keys for it:

group1 = pd.DataFrame({'date': {88: datetime.date(2017, 10, 3), 
                                43: datetime.date(2017, 9, 26), 
                                159: datetime.date(2017, 11, 8)}, 
                       u'element_id': {88: 122, 43: 122, 159: 122}, 
                       u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}})


d = {'level_0':'g'}
group_data=pd.concat([group1,group1], keys=('a','b')).reset_index(level=0).rename(columns=d)
print (group_data)
     g VALUE        date  element_id
43   a   2.0  2017-09-26         122
88   a   8.0  2017-10-03         122
159  a   5.0  2017-11-08         122
43   b   2.0  2017-09-26         122
88   b   8.0  2017-10-03         122
159  b   5.0  2017-11-08         122


group_data['date'] = pd.to_datetime(group_data['date'])

df = (group_data.groupby(['g','element_id'])
                .apply(lambda x: x.set_index('date').resample('D').ffill()))

print (df.head())

                         g VALUE  element_id
g element_id date                           
a 122        2017-09-26  a   2.0         122
             2017-09-27  a   2.0         122
             2017-09-28  a   2.0         122
             2017-09-29  a   2.0         122
             2017-09-30  a   2.0         122

answered Sep 24 '22 19:09

jezrael

Related questions
                            
                                Pandas: expanding DataFrame by number of observations in column
                            
                                passing pandas dataframe into a python subprocess.Popen as an argument
                            
                                Removing 'overlapping' dates from pandas dataframe
                            
                                how to split and concat pandas dataframe
                            
                                Correct use of map for mapping a function onto a df, python pandas
                            
                                Getting the three smallest values per row and returning the correspondent column names
                            
                                Pandas Multi-Index DataFrame to Numpy Ndarray
                            
                                Count distinct strings in rolling window using pandas
                            
                                Updating pandas to version 0.19 in Azure ML Studio
                            
                                How can I get the first word from each string in my Dataframe using Python?
                            
                                pandas dataframe create a new dataframe by duplicating n times rows of the previous dataframe and change date
                            
                                Calculating XIRR in Python
                            
                                Boolean values to column names in one list, dataframe pandas python
                            
                                User defined function on pandas dataframe
                            
                                Why pandas by themseleves convert int values in dataframe to float?
                            
                                pandas to_Datetime conversion with timezone aware index
                            
                                Pandas, are there any faster ways to update values?
                            
                                What is the fastest and generic way to flatten deeply nested JSON to get a Dataframe?
                            
                                pandas calculate mean of column that has lists instead of single value
                            
                                Is there a elegant way to only keep top[2~3] value for each row in a matrix?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With