Replicating GROUP_CONCAT for pandas.DataFrame

Tags:

I have a pandas DataFrame df:

+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------

I want to find or write a function to return a DataFrame that I would return in MySQL using the following:

SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team

for the following result:

+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+

I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.

778

asked Aug 09 '13 01:08

Mitch Flax

1 Answers

Do the following:

df.groupby('team').apply(lambda x: ','.join(x.user))

to get a Series of strings or

df.groupby('team').apply(lambda x: list(x.user))

to get a Series of lists of strings.

Here's what the results look like:

In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object

In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object

Note that in general any further operations on these types of Series will be slow and are generally discouraged. If there's another way to aggregate without putting a list inside of a Series you should consider using that approach instead.

152

answered Oct 11 '22 13:10

Phillip Cloud

Related questions
                            
                                How can I print all arguments passed to a python script?
                            
                                How to upload an image with python-tornado from an HTML form?
                            
                                How can I import a Python library located in the current working directory? [duplicate]
                            
                                sort mongodb documents by timestamp (in desc order)
                            
                                Python 3.4 :ImportError: no module named win32api
                            
                                Django - How to use decorator in class-based view methods?
                            
                                Caffe: Reading LMDB from Python
                            
                                How to find out where the Python include directory is?
                            
                                How to implement sql coalesce in pandas
                            
                                Python - Json List to Pandas Dataframe
                            
                                How many bytes per element are there in a Python list (tuple)?
                            
                                Can Python's set absence of ordering be considered random order?
                            
                                What to consider before subclassing list?
                            
                                how to send success message if we use django generic views
                            
                                Packing 4 Integers as ONE BYTE?
                            
                                python setup.py sdist error: Operation not permitted
                            
                                How to purge all tasks of a specific queue with celery in python?
                            
                                Increase celery retry time each retry cycle
                            
                                Retaining order while using Python's set difference
                            
                                How to open a new window on a browser using Selenium WebDriver for python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replicating GROUP_CONCAT for pandas.DataFrame

Tags:

python

pandas

mysql

Mitch Flax

People also ask

1 Answers

Phillip Cloud

Recent Activity

Donate For Us