Im looking to apply pading
over each group of my data frame
notice that for a single group ('element_id') i have no problem in pading:
first group (group1):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
So im applying padding over it (which works great):
print group1.set_index('date').asfreq('D', method='pad').head()
Im looking to apply this logic over several groups through groupby
Another group (group2):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
group_data=pd.concat([group1,group2],axis=0)
group_data.groupby(['element_id']).set_index('date').resample('D').asfreq()
And im getting the following error:
AttributeError: Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using the 'apply' method
groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
Groupby preserves the order of rows within each group.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
First there is problem your date
column has dtype
object, not datetime, so first is necessary convert it by to_datetime
.
Then is possible use GroupBy.apply
:
group_data['date'] = pd.to_datetime(group_data['date'])
df = (group_data.groupby(['element_id'])
.apply(lambda x: x.set_index('date').resample('D').ffill()))
print (df.head())
VALUE element_id
element_id date
122 2017-09-26 2.0 122
2017-09-27 2.0 122
2017-09-28 2.0 122
2017-09-29 2.0 122
2017-09-30 2.0 122
Or DataFrameGroupBy.resample
:
df = group_data.set_index('date').groupby(['element_id']).resample('D').ffill()
print (df.head())
VALUE element_id
element_id date
122 2017-09-26 2.0 122
2017-09-27 2.0 122
2017-09-28 2.0 122
2017-09-29 2.0 122
2017-09-30 2.0 122
EDIT:
If problem with duplicates values solution is add new column for subgroups with unique dates
. If use concat
there is parameter keys
for it:
group1 = pd.DataFrame({'date': {88: datetime.date(2017, 10, 3),
43: datetime.date(2017, 9, 26),
159: datetime.date(2017, 11, 8)},
u'element_id': {88: 122, 43: 122, 159: 122},
u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}})
d = {'level_0':'g'}
group_data=pd.concat([group1,group1], keys=('a','b')).reset_index(level=0).rename(columns=d)
print (group_data)
g VALUE date element_id
43 a 2.0 2017-09-26 122
88 a 8.0 2017-10-03 122
159 a 5.0 2017-11-08 122
43 b 2.0 2017-09-26 122
88 b 8.0 2017-10-03 122
159 b 5.0 2017-11-08 122
group_data['date'] = pd.to_datetime(group_data['date'])
df = (group_data.groupby(['g','element_id'])
.apply(lambda x: x.set_index('date').resample('D').ffill()))
print (df.head())
g VALUE element_id
g element_id date
a 122 2017-09-26 a 2.0 122
2017-09-27 a 2.0 122
2017-09-28 a 2.0 122
2017-09-29 a 2.0 122
2017-09-30 a 2.0 122
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With