This may seem like a stupid question, but this has been bugging me for some time.
df1:
imp_type value
1 abc
2 def
3 ghi
df2:
id value2
1 123
2 345
3 567
Merginge the 2 df's:
df1.merge(df2, left_on='imp_type',right_on='id')
yields:
imp_type value id value2
1 abc 1 123
2 def 2 345
3 ghi 3 567
Then I need to drop the id
column since it's essentially a duplicate of the imp_type column. Why does merge pull in the join key between the 2 dataframes by default? I would think there should at least be a param to set to False if you don't want to pull in the join key. Is there something like this already or something I'm doing wrong?
We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Example1: Let's create a Dataframe and then merge them into a single dataframe. Creating a Dataframe: Python3.
merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.
Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.
I agree it would be nice if one of the columns were dropped. Of course, then there is the question of what to name the remaining column.
Anyway, here is a workaround. Simply rename one of the columns so that the joined column(s) have the same name:
In [23]: df1 = pd.DataFrame({'imp_type':[1,2,3], 'value':['abc','def','ghi']}) In [27]: df2 = pd.DataFrame({'id':[1,2,3], 'value2':[123,345,567]}) In [28]: df2.columns = ['imp_type','value2'] In [29]: df1.merge(df2, on='imp_type') Out[29]: imp_type value value2 0 1 abc 123 1 2 def 345 2 3 ghi 567
Renaming the columns is a bit of a pain, especially (as DSM points out) compared to .drop('id', 1)
. However, if you can arrange for the joined columns to have the same name from the very beginning, then df1.merge(df2, on='imp_type')
would be easiest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With