Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging data frame columns of strings into one single column in Pandas

Tags:

python

pandas

I have columns in a dataframe (imported from a CSV) containing text like this.

"New york", "Atlanta", "Mumbai"
"Beijing", "Paris", "Budapest"
"Brussels", "Oslo", "Singapore"

I want to collapse/merge all the columns into one single column, like this

New york Atlanta
Beijing Paris Budapest
Brussels Oslo Singapore

How to do it in pandas?

like image 819
pbu Avatar asked Dec 05 '22 00:12

pbu


2 Answers

A faster (but uglier) version is with .cat:

df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')

0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
Name: 0, dtype: object

On a larger (10kx5) DataFrame:

%timeit df.apply(" ".join, axis=1)
10 loops, best of 3: 112 ms per loop

%timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
100 loops, best of 3: 4.48 ms per loop
like image 196
ayhan Avatar answered Dec 06 '22 13:12

ayhan


Suppose you have a DataFrame like so:

>>> df
          0        1          2
0  New york  Atlanta     Mumbai
1   Beijing    Paris   Budapest
2  Brussels     Oslo  Singapore

Then, a simple use of the pd.DataFrame.apply method will work nicely:

>>> df.apply(" ".join, axis=1)
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object

Note, I have to pass axis=1 so that it is applied across the columns, rather than down the rows. I.e:

>>> df.apply(" ".join, axis=0)
0    New york Beijing Brussels
1           Atlanta Paris Oslo
2    Mumbai Budapest Singapore
dtype: object
like image 20
juanpa.arrivillaga Avatar answered Dec 06 '22 13:12

juanpa.arrivillaga