Concatenate all columns in a pandas dataframe

Tags:

I have multiple pandas dataframe which may have different number of columns and the number of these columns typically vary from 50 to 100. I need to create a final column that is simply all the columns concatenated. Basically the string in the first row of the column should be the sum(concatenation) of the strings on the first row of all the columns. I wrote the loop below but I feel there might be a better more efficient way to do this. Any ideas on how to do this

num_columns = df.columns.shape[0]
col_names = df.columns.values.tolist()
df.loc[:, 'merged'] = ""
for each_col_ind in range(num_columns):
    print('Concatenating', col_names[each_col_ind])
    df.loc[:, 'merged'] = df.loc[:, 'merged'] + df[col_names[each_col_ind]]

691

asked Mar 14 '17 13:03

Clock Slave

2 Answers

Solution with sum, but output is float, so convert to int and str is necessary:

df['new'] = df.sum(axis=1).astype(int).astype(str)

Another solution with apply function join, but it the slowiest:

df['new'] = df.apply(''.join, axis=1)

Last very fast numpy solution - convert to numpy array and then 'sum':

df['new'] = df.values.sum(axis=1)

Timings:

df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})
#[30000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
#print (df)

cols = list('ABC')

#not_a_robot solution
In [259]: %timeit df['concat'] = pd.Series(df[cols].fillna('').values.tolist()).str.join('')
100 loops, best of 3: 17.4 ms per loop

In [260]: %timeit df['new'] = df[cols].astype(str).apply(''.join, axis=1)
1 loop, best of 3: 386 ms per loop

In [261]: %timeit df['new1'] = df[cols].values.sum(axis=1)
100 loops, best of 3: 6.5 ms per loop

In [262]: %timeit df['new2'] = df[cols].astype(str).sum(axis=1).astype(int).astype(str)
10 loops, best of 3: 68.6 ms per loop

EDIT If dtypes of some columns are not object (obviously strings) cast by DataFrame.astype:

df['new'] = df.astype(str).values.sum(axis=1)

150

answered Sep 28 '22 05:09

jezrael

df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})

df['concat'] = pd.Series(df.fillna('').values.tolist()).str.join('')

Gives us:

df
Out[6]: 
   A  B  C concat
0  1  4  7    147
1  2  5  8    258
2  3  6  9    369

To select a given set of columns:

df['concat'] = pd.Series(df[['A', 'B']].fillna('').values.tolist()).str.join('')

df
Out[8]: 
   A  B  C concat
0  1  4  7     14
1  2  5  8     25
2  3  6  9     36

However, I've noticed that approach can sometimes result in NaNs being populated where they shouldn't, so here's another way:

>>> from functools import reduce
>>> df['concat'] = df[cols].apply(lambda x: reduce(lambda a, b: a + b, x), axis=1)
>>> df
   A  B  C concat
0  1  4  7    147
1  2  5  8    258
2  3  6  9    369

Although it should be noted that this approach is a lot slower:

$ python3 -m timeit 'import pandas as pd;from functools import reduce; df=pd.DataFrame({"a": ["this", "is", "a", "string"] * 5000, "b": ["this", "is", "a", "string"] * 5000});[df[["a", "b"]].apply(lambda x: reduce(lambda a, b: a + b, x)) for _ in range(10)]'
10 loops, best of 3: 451 msec per loop

Versus

$ python3 -m timeit 'import pandas as pd;from functools import reduce; df=pd.DataFrame({"a": ["this", "is", "a", "string"] * 5000, "b": ["this", "is", "a", "string"] * 5000});[pd.Series(df[["a", "b"]].fillna("").values.tolist()).str.join(" ") for _ in range(10)]'
10 loops, best of 3: 98.5 msec per loop

answered Sep 28 '22 04:09

blacksite

Related questions
                            
                                Python Variable in an HTML email in Python
                            
                                Recursion in Python? RuntimeError: maximum recursion depth exceeded while calling a Python object [duplicate]
                            
                                byte operations (XOR) in python
                            
                                How to justify text in label in Tkinter
                            
                                pip install psycopg2 - error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 [duplicate]
                            
                                Does python yield imply continue?
                            
                                Python: What does "foo() for i in range(bar)" mean?
                            
                                Python to list HTTP-files and directories
                            
                                Is there a better way to find if string contains digits?
                            
                                How to find the count of a word in a string?
                            
                                How to add a row in a tableWidget PyQT?
                            
                                Why does TensorFlow example fail when increasing batch size?
                            
                                How do I transform a multi-level list into a list of strings in Python?
                            
                                Convert python filenames to unicode
                            
                                Easy_install and Pip doesn't work
                            
                                How can I import a package using __import__() when the package name is only known at runtime?
                            
                                django render_to_response is not defined error
                            
                                How to write multiple strings in one line?
                            
                                django request.user.is_authenticated is always true?
                            
                                Apps won't run on GAE - 'unable to bind to localhost:0'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concatenate all columns in a pandas dataframe

Tags:

python

pandas

Clock Slave

People also ask

2 Answers

jezrael

blacksite

Recent Activity

Donate For Us