Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge columns, but not the 'key' column

Tags:

python

pandas

This may seem like a stupid question, but this has been bugging me for some time.

df1:

imp_type    value
1           abc
2           def
3           ghi

df2:

id          value2
1           123
2           345
3           567

Merginge the 2 df's:

df1.merge(df2, left_on='imp_type',right_on='id')

yields:

imp_type    value    id    value2
1           abc      1     123
2           def      2     345
3           ghi      3     567

Then I need to drop the id column since it's essentially a duplicate of the imp_type column. Why does merge pull in the join key between the 2 dataframes by default? I would think there should at least be a param to set to False if you don't want to pull in the join key. Is there something like this already or something I'm doing wrong?

like image 422
ChrisArmstrong Avatar asked Mar 05 '14 20:03

ChrisArmstrong


People also ask

How do I merge only selected columns in pandas?

We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Example1: Let's create a Dataframe and then merge them into a single dataframe. Creating a Dataframe: Python3.

How do I get rid of duplicate columns while merging pandas?

merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.

What's the difference between PD join and PD merge?

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.


1 Answers

I agree it would be nice if one of the columns were dropped. Of course, then there is the question of what to name the remaining column.

Anyway, here is a workaround. Simply rename one of the columns so that the joined column(s) have the same name:

In [23]: df1 = pd.DataFrame({'imp_type':[1,2,3], 'value':['abc','def','ghi']})  In [27]: df2 = pd.DataFrame({'id':[1,2,3], 'value2':[123,345,567]})  In [28]: df2.columns = ['imp_type','value2']  In [29]: df1.merge(df2, on='imp_type') Out[29]:     imp_type value  value2 0         1   abc     123 1         2   def     345 2         3   ghi     567 

Renaming the columns is a bit of a pain, especially (as DSM points out) compared to .drop('id', 1). However, if you can arrange for the joined columns to have the same name from the very beginning, then df1.merge(df2, on='imp_type') would be easiest.

like image 50
unutbu Avatar answered Sep 21 '22 22:09

unutbu