Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby with count, sum and avg

I have a following DF in pandas:

+---------+--------+--------------------+
| keyword | weight |   other keywords   |
+---------+--------+--------------------+
| dog     | 0.12   | [cat, horse, pig]  |
| cat     | 0.5    | [dog, pig, camel]  |
| horse   | 0.07   | [dog, camel, cat]  |
| dog     | 0.1    | [cat, horse]       |
| dog     | 0.2    | [cat, horse , pig] |
| horse   | 0.3    | [camel]            |
+---------+--------+--------------------+

The task I want to perform is grouping by keyword and at the same time counting keyword frequency, averaging by weight and summing by other keywords. The result would be something like that:

+---------+-----------+------------+------------------------------------------------+
| keyword | frequency | avg weight |                  sum other keywords            |
+---------+-----------+------------+------------------------------------------------+
| dog     |         3 | 0.14       | [cat, horse, pig, cat, horse, cat, horse, pig] |
| cat     |         1 | 0.5        | [dog, pig, camel]                              |
| horse   |         2 | 0.185      | [dog, camel, cat, camel]                       |
+---------+-----------+------------+------------------------------------------------+

Now, I know how to do it in many separate operations: value_counts, groupby.sum(), groupby.avg() and then merging it. However it's very inefficient and I have to do a lot of manual adjustments.

I am wondering if it's possible to do it in one operation?

like image 664
pawelty Avatar asked Mar 03 '17 09:03

pawelty


1 Answers

You can use agg:

df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'})
#set new ordering of columns
df = df.reindex_axis(['keyword','weight','other keywords'], axis=1)
#reset index
df = df.rename_axis(None).reset_index()
#set new column names
df.columns = ['keyword','frequency','avg weight','sum other keywords']

print (df)
  keyword  frequency  avg weight  \
0     cat          1       0.500   
1     dog          3       0.140   
2   horse          2       0.185   

                               sum other keywords  
0                               [dog, pig, camel]  
1  [cat, horse, pig, cat, horse, cat, horse, pig]  
2                        [dog, camel, cat, camel]  
like image 163
jezrael Avatar answered Sep 18 '22 22:09

jezrael