Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: concatenate a list of columns into one column

Tags:

python

pandas

I am wondering if I could build such a module in Pandas:

    def concatenate(df,columnlist,newcolumn):
        # df is the dataframe and
        # columnlist is the list contains the column names of all the columns I want to concatnate
        # newcolumn is the name of the resulted new column

        for c in columnlist:
            ...some Pandas functions

        return df # this one has the concatenated "newcolumn"

I am asking this because that len(columnlist) is going to be very big and dynamic. Thanks!

like image 288
LarryZ Avatar asked Nov 25 '17 00:11

LarryZ


People also ask

How do I merge data from multiple columns into one column in pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

How do I concatenate multiple columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How do I put multiple columns into one column in Python?

Use the CONCATENATE function: Use the CONCATENATE function in column D: =CONCATENATE(A1,B1,C1). In the menu bar, select Insert, Function. Click Text functions and select CONCATENATE.


2 Answers

Try this:

import numpy as np
np.add.reduce(df[columnlist], axis=1)

What this does is to "add" the values in each row, which for strings means to concatenate them ("abc" + "de" == "abcde").


Originally I thought you wanted to concatenate them lengthwise, into a single longer series of all the values. If anyone else wants to do that, here's the code:

pd.concat(map(df.get, columnlist)).reset_index(drop=True)
like image 185
John Zwinck Avatar answered Nov 06 '22 00:11

John Zwinck


Given a dataframe like this:

df

     A    B
0  aaa  ddd
1  bbb  eee
2  ccc  fff

You can just use df.sum, given every column is a string column:

df.sum(1)

0    aaaddd
1    bbbeee
2    cccfff
dtype: object

If you need to perform a conversion, you can do so:

df.astype(str).sum(1)

If you need to select a subset of your data (only string columns?), you can use select_dtypes:

df.select_dtypes(include=['str']).sum(1)

If you need to select by columns, this should do:

df[['A', 'B']].sum(1)

In every case, the addition is not inplace, so if you want to persist your result, please assign it back:

r = df.sum(1)
like image 42
cs95 Avatar answered Nov 05 '22 23:11

cs95