Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple column values into one column in python pandas

I have a pandas data frame like this:

   Column1  Column2  Column3  Column4  Column5  0    a        1        2        3        4  1    a        3        4        5  2    b        6        7        8  3    c        7        7         

What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:

  Column1  ColumnA 0   a      1,2,3,4 1   a      3,4,5 2   b      6,7,8 3   c      7,7 

How could I best approach this issue? Any advice would be helpful. Thanks in advance!

like image 606
sequence_hard Avatar asked Oct 13 '15 09:10

sequence_hard


People also ask

How do I combine multiple columns into one column in Pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

How do I combine column values in Pandas?

To start, you may use this template to concatenate your column values (for strings only): df['New Column Name'] = df['1st Column Name'] + df['2nd Column Name'] + ... Notice that the plus symbol ('+') is used to perform the concatenation.

How do I get multiple columns under a single column in Pandas?

Step #1: Load numpy and Pandas. Step #2: Create random data and use them to create a pandas dataframe. Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. Step #4: Then use Pandas dataframe into dict.

How do I concatenate two columns in a data frame?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame.


2 Answers

You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:

In [153]: df['ColumnA'] = df[df.columns[1:]].apply(     lambda x: ','.join(x.dropna().astype(str)),     axis=1 ) df  Out[153]:   Column1  Column2  Column3  Column4  Column5  ColumnA 0       a        1        2        3        4  1,2,3,4 1       a        3        4        5      NaN    3,4,5 2       b        6        7        8      NaN    6,7,8 3       c        7        7      NaN      NaN      7,7 

Here I call dropna to get rid of the NaN, however we need to cast again to int so we don't end up with floats as str.

like image 174
EdChum Avatar answered Oct 22 '22 16:10

EdChum


I propose to use .assign

df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \   df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \   df.Column4.astype(str) + ', ' df.Column5.astype(str)) 

it's simple, maybe long but it worked for me

like image 38
Amin Salgado Avatar answered Oct 22 '22 14:10

Amin Salgado