groupby / aggregate dataframe with non-numeric types

Question

I have the following situation

date_range = pd.date_range('20180101', '20180105')
date_list = list(itertools.chain.from_iterable(itertools.repeat(date, 2) for date in date_range))
num_list = np.random.randint(1,100,size=(10))
date2 = ['2018-12-31']*10

df = pd.DataFrame({'date1':date_list,'numbers':num_list,'date2':date2})

displaying this dataframe gives

      date1        date2    numbers
0   2018-01-01  2018-12-31  38
1   2018-01-01  2018-12-31  2
2   2018-01-02  2018-12-31  8
3   2018-01-02  2018-12-31  51
4   2018-01-03  2018-12-31  16
5   2018-01-03  2018-12-31  22
6   2018-01-04  2018-12-31  43
7   2018-01-04  2018-12-31  76
8   2018-01-05  2018-12-31  47
9   2018-01-05  2018-12-31  50

i would like to obtain a new dataframe which is a) grouped by date1, b) sums up the values for each date1 in the numbers column, and c) keeps the date2 value (which we can assume would be the same for each date1 or, in this case, the same for the entire dataframe

i can do the following to achieve a+b, but if i try to include something like 'date2':'mean' in the aggregation dictionary it will not work and return DataError: No numeric types to aggregate

df.groupby(['date1'],as_index=False).agg({'numbers':'sum'})

any advice?

jezrael · Accepted Answer

It seems you need if date2 is same for each group:

df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})

Or need aggregate by first:

df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':'first'})

But if need mean of datetime it is a bit complicated:

df['date2'] = pd.to_datetime(df['date2'])
f = lambda x: pd.to_datetime(x.values.astype(np.int64).mean())
df1 = df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':f})
print (df1)
       date1  numbers      date2
0 2018-01-01      159 2018-12-31
1 2018-01-02      104 2018-12-31
2 2018-01-03       75 2018-12-31
3 2018-01-04       98 2018-12-31
4 2018-01-05      184 2018-12-31

groupby / aggregate dataframe with non-numeric types

Tags:

python

pandas

laszlopanaflex

1 Answers

jezrael

Recent Activity

Donate For Us

groupby / aggregate dataframe with non-numeric types

Tags:

python

pandas

laszlopanaflex

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us