How can I force a suffix on a merge or join. I understand it's possible to provide one if there is a collision but in my case I'm merging df1 with df2 which doesn't cause any collision but then merging again on df2 which uses the suffixes but I would prefer for each merge to have a suffix because it gets confusing if I do different combinations as you could imagine.
Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.
As you can see, the merge is faster than joins, though it is small value, but over 4000 iterations, that small value becomes a huge number, in minutes.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
You could force a suffix on the actual DataFrame:
In [11]: df_a = pd.DataFrame([[1], [2]], columns=['A']) In [12]: df_b = pd.DataFrame([[3], [4]], columns=['B']) In [13]: df_a.join(df_b) Out[13]: A B 0 1 3 1 2 4
By appending to it's column's names:
In [14]: df_a.columns = df_a.columns.map(lambda x: str(x) + '_a') In [15]: df_a Out[15]: A_a 0 1 1 2
Now joins won't need the suffix correction, whether they collide or not:
In [16]: df_b.columns = df_b.columns.map(lambda x: str(x) + '_b') In [17]: df_a.join(df_b) Out[17]: A_a B_b 0 1 3 1 2 4
As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method.
This makes a one-liner merge command with force-suffix more bearable, for example:
df_merged = df1.merge(df2.add_suffix('_2'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With