I have a dataset like:
id date value
1 16-12-1 9
1 16-12-1 8
1 17-1-1 18
2 17-3-4 19
2 17-3-4 20
1 17-4-3 21
2 17-7-13 12
3 17-8-9 12
2 17-9-12 11
1 17-11-12 19
3 17-11-12 21
The only structure above is that the rows are sorted by date.
What I want to do is, group by id and resample the dates, so that each id has the same number values. A monthly, weekly or daily resampling would suffice.
My final dataset (at yearly resampling) would look like:
id interval value
1 16-12-1 - 17-12-1 75
2 16-12-1 - 17-12-1 62
3 16-12-1 - 17-12-1 33
How to implement this? Will this work (since I do not have the seconds in the date field, i.e. it is not a standard pandas datetime object)?
dataframe.groupby(id).resample('year')
Is there any faster way to do this?
Weekly sum by id:
df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
df = df.set_index('date')
df.groupby('id').resample('W')['value'].agg('sum').loc[lambda x: x>0]
Output:
id date
1 2016-12-04 17
2017-01-01 18
2017-04-09 21
2017-11-12 19
2 2017-03-05 39
2017-07-16 12
2017-09-17 11
3 2017-08-13 12
2017-11-12 21
Name: value, dtype: int64
Updated Attempt (with re-sampling)
Generate data
d = [['id', 'date', 'value'],
[1, '2016-12-1', 9],
[1, '2016-12-1',8],
[1, '2017-1-1',18],
[2, '2017-3-4',19],
[2, '2017-3-4',20],
[1,'2017-4-3',21],
[2, '2017-7-13',12],
[3, '2017-8-9',12],
[2, '2017-9-12',11],
[1, '2017-11-12',19],
[3, '2017-11-12',21],]
df = pd.DataFrame(d[1:], columns=d[0])
print(df)
id date value
0 1 2016-12-1 9
1 1 2016-12-1 8
2 1 2017-1-1 18
3 2 2017-3-4 19
4 2 2017-3-4 20
5 1 2017-4-3 21
6 2 2017-7-13 12
7 3 2017-8-9 12
8 2 2017-9-12 11
9 1 2017-11-12 19
10 3 2017-11-12 21
Yearly re-sampling followed by sum
df.index = pd.to_datetime(df.date)
df.drop('date',axis = 1, inplace = True)
df_g = df.groupby('id').resample('y').sum()
df_g.drop(columns=['id'], inplace=True)
print(df_g)
value
id date
1 2016-12-31 17
2017-12-31 58
2 2017-12-31 62
3 2017-12-31 33
INITIAL answer (without re-sampling)
You could do this, but it does not perform an explicit resample
operation
f = {'date':['min','max'],
'value':'sum'}
df_grouped = df.groupby(['id']).agg(f)
df_grouped.columns = ['_'.join(col) for col in df_grouped.columns.values]
df_grouped.reset_index(drop=False, inplace=True)
print(df_grouped)
id date_min date_max value_sum
0 1 16-12-1 17-4-3 75
1 2 17-3-4 17-9-12 62
2 3 17-11-12 17-8-9 33
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With