I have a pandas DataFrame df:
+------+---------+
| team | user |
+------+---------+
| A | elmer |
| A | daffy |
| A | bugs |
| B | dawg |
| A | foghorn |
| B | speedy |
| A | goofy |
| A | marvin |
| B | pepe |
| C | petunia |
| C | porky |
+------+---------
I want to find or write a function to return a DataFrame that I would return in MySQL using the following:
SELECT
team,
GROUP_CONCAT(user)
FROM
df
GROUP BY
team
for the following result:
+------+---------------------------------------+
| team | group_concat(user) |
+------+---------------------------------------+
| A | elmer,daffy,bugs,foghorn,goofy,marvin |
| B | dawg,speedy,pepe |
| C | petunia,porky |
+------+---------------------------------------+
I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Parameters bymapping, function, label, or list of labels. Used to determine the groups for the groupby.
iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it's contents.
Do the following:
df.groupby('team').apply(lambda x: ','.join(x.user))
to get a Series
of strings or
df.groupby('team').apply(lambda x: list(x.user))
to get a Series
of list
s of strings.
Here's what the results look like:
In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a elmer, daffy, bugs, foghorn, goofy, marvin
b dawg, speedy, pepe
c petunia, porky
dtype: object
In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a [elmer, daffy, bugs, foghorn, goofy, marvin]
b [dawg, speedy, pepe]
c [petunia, porky]
dtype: object
Note that in general any further operations on these types of Series
will be slow and are generally discouraged. If there's another way to aggregate without putting a list
inside of a Series
you should consider using that approach instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With