I have a pandas DataFrame df:
+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------  
I want to find or write a function to return a DataFrame that I would return in MySQL using the following:
SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team
for the following result:
+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+  
I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Parameters bymapping, function, label, or list of labels. Used to determine the groups for the groupby.
iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it's contents.
Do the following:
df.groupby('team').apply(lambda x: ','.join(x.user))
to get a Series of strings or
df.groupby('team').apply(lambda x: list(x.user))
to get a Series of lists of strings.
Here's what the results look like:
In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object
In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object
Note that in general any further operations on these types of Series will be slow and are generally discouraged. If there's another way to aggregate without putting   a list inside of a Series you should consider using that approach instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With