I have a following DF in pandas:
+---------+--------+--------------------+
| keyword | weight | other keywords |
+---------+--------+--------------------+
| dog | 0.12 | [cat, horse, pig] |
| cat | 0.5 | [dog, pig, camel] |
| horse | 0.07 | [dog, camel, cat] |
| dog | 0.1 | [cat, horse] |
| dog | 0.2 | [cat, horse , pig] |
| horse | 0.3 | [camel] |
+---------+--------+--------------------+
The task I want to perform is grouping by keyword and at the same time counting keyword frequency, averaging by weight and summing by other keywords. The result would be something like that:
+---------+-----------+------------+------------------------------------------------+
| keyword | frequency | avg weight | sum other keywords |
+---------+-----------+------------+------------------------------------------------+
| dog | 3 | 0.14 | [cat, horse, pig, cat, horse, cat, horse, pig] |
| cat | 1 | 0.5 | [dog, pig, camel] |
| horse | 2 | 0.185 | [dog, camel, cat, camel] |
+---------+-----------+------------+------------------------------------------------+
Now, I know how to do it in many separate operations: value_counts, groupby.sum(), groupby.avg() and then merging it. However it's very inefficient and I have to do a lot of manual adjustments.
I am wondering if it's possible to do it in one operation?
You can use agg
:
df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'})
#set new ordering of columns
df = df.reindex_axis(['keyword','weight','other keywords'], axis=1)
#reset index
df = df.rename_axis(None).reset_index()
#set new column names
df.columns = ['keyword','frequency','avg weight','sum other keywords']
print (df)
keyword frequency avg weight \
0 cat 1 0.500
1 dog 3 0.140
2 horse 2 0.185
sum other keywords
0 [dog, pig, camel]
1 [cat, horse, pig, cat, horse, cat, horse, pig]
2 [dog, camel, cat, camel]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With