How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?
df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')
This provides a DataFrame like this
But clearly I am merging on UserName
and UserID
so they are the same. I want it to look like this. Is there any clean ways to do this?
Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like
pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')
In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd. merge() function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.
How about set the UserID
as index and then join on index for the second data frame?
pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)
# Col1 UserName Col2
# 0 a 1 d
# 1 b 2 e
# 2 c 3 f
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With