Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine multiple rows of strings into one using pandas?

I have a DataFrame with multiple rows. Is there any way in which they can be combined to form one string?

For example:

     words
0    I, will, hereby
1    am, gonna
2    going, far
3    to
4    do
5    this

Expected output:

I, will, hereby, am, gonna, going, far, to, do, this
like image 213
eclairs Avatar asked Oct 22 '15 11:10

eclairs


People also ask

How do I merge strings in pandas?

Pandas str.cat() is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. . str has to be prefixed to differentiate it from the Python's default method.

How do you combine rows in Python?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another.

How do I concatenate values in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation. Yields below output.


3 Answers

You can use str.cat to join the strings in each row. For a Series or column s, write:

>>> s.str.cat(sep=', ')
'I, will, hereby, am, gonna, going, far, to, do, this'
like image 186
Alex Riley Avatar answered Oct 12 '22 02:10

Alex Riley


How about traditional python's join? And, it's faster.

In [209]: ', '.join(df.words)
Out[209]: 'I, will, hereby, am, gonna, going, far, to, do, this'

Timings in Dec, 2016 on pandas 0.18.1

In [214]: df.shape
Out[214]: (6, 1)

In [215]: %timeit df.words.str.cat(sep=', ')
10000 loops, best of 3: 72.2 µs per loop

In [216]: %timeit ', '.join(df.words)
100000 loops, best of 3: 14 µs per loop

In [217]: df = pd.concat([df]*10000, ignore_index=True)

In [218]: df.shape
Out[218]: (60000, 1)

In [219]: %timeit df.words.str.cat(sep=', ')
100 loops, best of 3: 5.2 ms per loop

In [220]: %timeit ', '.join(df.words)
100 loops, best of 3: 1.91 ms per loop
like image 31
Zero Avatar answered Oct 12 '22 02:10

Zero


If you have a DataFrame rather than a Series and you want to concatenate values (I think text values only) from different rows based on another column as a 'group by' key, then you can use the .agg method from the class DataFrameGroupBy. Here is a link to the API manual.

Sample code tested with Pandas v0.18.1:

import pandas as pd

df = pd.DataFrame({
    'category': ['A'] * 3 + ['B'] * 2,
    'name': ['A1', 'A2', 'A3', 'B1', 'B2'],
    'num': range(1, 6)
})

df.groupby('category').agg({
    'name': lambda x: ', '.join(x),
    'num': lambda x: x.max()
})
like image 24
Zhong Dai Avatar answered Oct 12 '22 04:10

Zhong Dai