I am trying to concat multiple Pandas DataFrame columns with different tokens. For example, my dataset looks like this : <pre class="prettyprint"><code>dataframe = pd.DataFrame({'col_1' : ['aaa','bbb','ccc','ddd'], 'col_2' : ['name_aaa','name_bbb','name_ccc','name_ddd'], 'col_3' : ['job_aaa','job_bbb','job_ccc','job_ddd']}) </code></pre> I want to output something like this: <pre class="prettyprint"><code> features 0 aaa <0> name_aaa <1> job_aaa 1 bbb <0> name_bbb <1> job_bbb 2 ccc <0> name_ccc <1> job_ccc 3 ddd <0> name_ddd <1> job_ddd </code></pre> Explanation : concat each column with "<{}>" where {} will be increasing numbers. What I've tried so far: I don't want to modify original DataFrame so I created two new dataframe: <pre class="prettyprint"><code>features_df = pd.DataFrame() final_df = pd.DataFrame() for iters in range(len(dataframe.columns)): features_df[dataframe.columns[iters]] = dataframe[dataframe.columns[iters]] + ' ' + "<{}>".format(iters) final_df['features'] = features_df[features_df.columns].agg(' '.join, axis=1) </code></pre> There is an issue I am facing, It's adding <2> at last but I want output like above, also this is not panda's way to do this task, How I can make it more efficient?

<pre class="prettyprint"><code>from itertools import chain dataframe['features'] = dataframe.apply(lambda x: ''.join([*chain.from_iterable((v, f' <{i}> ') for i, v in enumerate(x))][:-1]), axis=1) print(dataframe) </code></pre> Prints: <pre class="prettyprint"><code> col_1 col_2 col_3 features 0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa 1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb 2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc 3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd </code></pre>

How to concat multiple Pandas DataFrame columns with different token separator?

Tags:

python

python-3.x

pandas

dataframe

I am trying to concat multiple Pandas DataFrame columns with different tokens.

For example, my dataset looks like this :

dataframe = pd.DataFrame({'col_1' : ['aaa','bbb','ccc','ddd'], 
                          'col_2' : ['name_aaa','name_bbb','name_ccc','name_ddd'], 
                          'col_3' : ['job_aaa','job_bbb','job_ccc','job_ddd']})

I want to output something like this:

    features
0   aaa <0> name_aaa <1> job_aaa
1   bbb <0> name_bbb <1> job_bbb
2   ccc <0> name_ccc <1> job_ccc
3   ddd <0> name_ddd <1> job_ddd

Explanation :

concat each column with "<{}>" where {} will be increasing numbers.

What I've tried so far:

I don't want to modify original DataFrame so I created two new dataframe:

features_df = pd.DataFrame()
final_df    = pd.DataFrame()
for iters in range(len(dataframe.columns)):
    features_df[dataframe.columns[iters]] = dataframe[dataframe.columns[iters]] + ' ' + "<{}>".format(iters)
final_df['features'] = features_df[features_df.columns].agg(' '.join, axis=1)

There is an issue I am facing, It's adding <2> at last but I want output like above, also this is not panda's way to do this task, How I can make it more efficient?

792

asked May 24 '20 08:05

Aaditya Ura

1 Answers

from itertools import chain

dataframe['features'] = dataframe.apply(lambda x: ''.join([*chain.from_iterable((v, f' <{i}> ') for i, v in enumerate(x))][:-1]), axis=1)

print(dataframe)

Prints:

  col_1     col_2    col_3                      features
0   aaa  name_aaa  job_aaa  aaa <0> name_aaa <1> job_aaa
1   bbb  name_bbb  job_bbb  bbb <0> name_bbb <1> job_bbb
2   ccc  name_ccc  job_ccc  ccc <0> name_ccc <1> job_ccc
3   ddd  name_ddd  job_ddd  ddd <0> name_ddd <1> job_ddd

117

answered Oct 02 '22 10:10

Andrej Kesely

Related questions
                            
                                Find out which font matplotlib uses
                            
                                Why does PyMongo throw AutoReconnect?
                            
                                Pandas MultiIndex: Divide all columns by one column
                            
                                Clustering cosine similarity matrix
                            
                                Why does CalibratedClassifierCV underperform a direct classifer?
                            
                                Merge Only When Value is Empty/Null in Pandas
                            
                                Cyclic shift of a pandas series
                            
                                Why is psycopg2 IntegrityError not being caught?
                            
                                Spline with constraints at border
                            
                                pip broken, reinstall doesn't work. EC2
                            
                                How to store scaling parameters for later use
                            
                                Python mock.patch: replace a method
                            
                                ValueError: day is out of range for month
                            
                                How can I create an in-memory database with sqlite?
                            
                                How can I download the chat history of a group in Telegram?
                            
                                How are python's unpacking operators * and ** used?
                            
                                Flatten numpy array with sub-arrays of different dimensions
                            
                                Difference between Context Managers and Decorators in Python
                            
                                Poetry and PyTorch
                            
                                re.findall('(ab|cd)', string) vs re.findall('(ab|cd)+', string)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With