I have a DataFrame with multiple rows. Is there any way in which they can be combined to form one string?
For example:
words
0 I, will, hereby
1 am, gonna
2 going, far
3 to
4 do
5 this
Expected output:
I, will, hereby, am, gonna, going, far, to, do, this
Pandas str.cat() is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. . str has to be prefixed to differentiate it from the Python's default method.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation. Yields below output.
You can use str.cat
to join the strings in each row. For a Series or column s
, write:
>>> s.str.cat(sep=', ')
'I, will, hereby, am, gonna, going, far, to, do, this'
How about traditional python's join
? And, it's faster.
In [209]: ', '.join(df.words)
Out[209]: 'I, will, hereby, am, gonna, going, far, to, do, this'
Timings in Dec, 2016 on pandas 0.18.1
In [214]: df.shape
Out[214]: (6, 1)
In [215]: %timeit df.words.str.cat(sep=', ')
10000 loops, best of 3: 72.2 µs per loop
In [216]: %timeit ', '.join(df.words)
100000 loops, best of 3: 14 µs per loop
In [217]: df = pd.concat([df]*10000, ignore_index=True)
In [218]: df.shape
Out[218]: (60000, 1)
In [219]: %timeit df.words.str.cat(sep=', ')
100 loops, best of 3: 5.2 ms per loop
In [220]: %timeit ', '.join(df.words)
100 loops, best of 3: 1.91 ms per loop
If you have a DataFrame
rather than a Series
and you want to concatenate values (I think text values only) from different rows based on another column as a 'group by' key, then you can use the .agg
method from the class DataFrameGroupBy
. Here is a link to the API manual.
Sample code tested with Pandas v0.18.1:
import pandas as pd
df = pd.DataFrame({
'category': ['A'] * 3 + ['B'] * 2,
'name': ['A1', 'A2', 'A3', 'B1', 'B2'],
'num': range(1, 6)
})
df.groupby('category').agg({
'name': lambda x: ', '.join(x),
'num': lambda x: x.max()
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With