Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas join DataFrame force suffix?

Tags:

python

pandas

How can I force a suffix on a merge or join. I understand it's possible to provide one if there is a collision but in my case I'm merging df1 with df2 which doesn't cause any collision but then merging again on df2 which uses the suffixes but I would prefer for each merge to have a suffix because it gets confusing if I do different combinations as you could imagine.

like image 316
stgtscc Avatar asked Feb 05 '14 21:02

stgtscc


People also ask

What's the difference between PD join and PD merge?

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.

Is join or merge faster pandas?

As you can see, the merge is faster than joins, though it is small value, but over 4000 iterations, that small value becomes a huge number, in minutes.

How do I combine two Dataframes in pandas?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.


2 Answers

You could force a suffix on the actual DataFrame:

In [11]: df_a = pd.DataFrame([[1], [2]], columns=['A'])  In [12]: df_b = pd.DataFrame([[3], [4]], columns=['B'])  In [13]: df_a.join(df_b) Out[13]:     A  B 0  1  3 1  2  4 

By appending to it's column's names:

In [14]: df_a.columns = df_a.columns.map(lambda x: str(x) + '_a')  In [15]: df_a Out[15]:     A_a 0    1 1    2 

Now joins won't need the suffix correction, whether they collide or not:

In [16]: df_b.columns = df_b.columns.map(lambda x: str(x) + '_b')  In [17]: df_a.join(df_b) Out[17]:     A_a  B_b 0    1    3 1    2    4 
like image 55
Andy Hayden Avatar answered Sep 19 '22 02:09

Andy Hayden


As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method.

This makes a one-liner merge command with force-suffix more bearable, for example:

 df_merged = df1.merge(df2.add_suffix('_2'))  
like image 43
Renier Botha Avatar answered Sep 21 '22 02:09

Renier Botha