I have the following situation
date_range = pd.date_range('20180101', '20180105')
date_list = list(itertools.chain.from_iterable(itertools.repeat(date, 2) for date in date_range))
num_list = np.random.randint(1,100,size=(10))
date2 = ['2018-12-31']*10
df = pd.DataFrame({'date1':date_list,'numbers':num_list,'date2':date2})
displaying this dataframe gives
date1 date2 numbers
0 2018-01-01 2018-12-31 38
1 2018-01-01 2018-12-31 2
2 2018-01-02 2018-12-31 8
3 2018-01-02 2018-12-31 51
4 2018-01-03 2018-12-31 16
5 2018-01-03 2018-12-31 22
6 2018-01-04 2018-12-31 43
7 2018-01-04 2018-12-31 76
8 2018-01-05 2018-12-31 47
9 2018-01-05 2018-12-31 50
i would like to obtain a new dataframe which is a) grouped by date1, b) sums up the values for each date1 in the numbers column, and c) keeps the date2 value (which we can assume would be the same for each date1 or, in this case, the same for the entire dataframe
i can do the following to achieve a+b, but if i try to include something like 'date2':'mean' in the aggregation dictionary it will not work and return DataError: No numeric types to aggregate
df.groupby(['date1'],as_index=False).agg({'numbers':'sum'})
any advice?
It seems you need if date2 is same for each group:
df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})
Or need aggregate by first:
df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':'first'})
But if need mean of datetime it is a bit complicated:
df['date2'] = pd.to_datetime(df['date2'])
f = lambda x: pd.to_datetime(x.values.astype(np.int64).mean())
df1 = df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':f})
print (df1)
date1 numbers date2
0 2018-01-01 159 2018-12-31
1 2018-01-02 104 2018-12-31
2 2018-01-03 75 2018-12-31
3 2018-01-04 98 2018-12-31
4 2018-01-05 184 2018-12-31
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With