Concatenate strings from several rows using Pandas groupby

People also ask

How do I concatenate rows in pandas?

Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.

Can you use Groupby with multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
    name         text  month
0  name1       hej,du     11
2  name1        aj,oj     12
4  name2     fin,katt     11
6  name2  mycket,lite     12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
    name  month         text
0  name1     11       hej,du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]: 
    name  month         text
0  name1     11           du
1  name1     12        aj,oj
2  name2     11     fin,katt
3  name2     12  mycket,lite

We can groupby the 'name' and 'month' columns, then call agg() functions of Panda’s DataFrame objects.

The aggregation functionality provided by the agg() function allows multiple statistics to be calculated per group in one calculation.

df.groupby(['name', 'month'], as_index = False).agg({'text': ' '.join})

enter image description here

The answer by EdChum provides you with a lot of flexibility but if you just want to concateate strings into a column of list objects you can also:

output_series = df.groupby(['name','month'])['text'].apply(list)

If you want to concatenate your "text" in a list:

df.groupby(['name', 'month'], as_index = False).agg({'text': list})

For me the above solutions were close but added some unwanted /n's and dtype:object, so here's a modified version:

df.groupby(['name', 'month'])['text'].apply(lambda text: ''.join(text.to_string(index=False))).str.replace('(\\n)', '').reset_index()

Although, this is an old question. But just in case. I used the below code and it seems to work like a charm.

text = ''.join(df[df['date'].dt.month==8]['text'])

Please try this line of code : -

df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Related questions
                            
                                How to get numbers after decimal point?
                            
                                Why does (1 in [1,0] == True) evaluate to False?
                            
                                What is the difference between the AWS boto and boto3
                            
                                Getting realtime output using subprocess
                            
                                How to convert a set to a list in python?
                            
                                __init__ for unittest.TestCase
                            
                                Numpy: Divide each row by a vector element
                            
                                What does preceding a string literal with "r" mean? [duplicate]
                            
                                Can I use __init__.py to define global variables?
                            
                                How do I call setattr() on the current module?
                            
                                How to duplicate virtualenv
                            
                                What is the difference between json.dumps and json.load? [closed]
                            
                                What do >> and << mean in Python?
                            
                                Add single element to array in numpy
                            
                                Getting attributes of a class
                            
                                VSCode -- how to set working directory for debug
                            
                                What's the easiest way to escape HTML in Python?
                            
                                Function for Factorial in Python
                            
                                How do lexical closures work?
                            
                                How can I connect to MySQL in Python 3 on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concatenate strings from several rows using Pandas groupby

Tags:

python

python-3.x

pandas

pandas-groupby

People also ask

Recent Activity

Donate For Us