Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas merge on columns with different names and avoid duplicates [duplicate]

How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?

df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')

This provides a DataFrame like this

enter image description here

But clearly I am merging on UserName and UserID so they are the same. I want it to look like this. Is there any clean ways to do this?

enter image description here

Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like

pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')
like image 779
E.K. Avatar asked Oct 11 '16 20:10

E.K.


People also ask

How do I avoid duplicates in Pandas merge?

In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd. merge() function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.

How merge columns with different names Pandas?

Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.

What is the difference between merging and concatenation in Pandas?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.


1 Answers

How about set the UserID as index and then join on index for the second data frame?

pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)

#   Col1    UserName    Col2
# 0    a           1       d
# 1    b           2       e
# 2    c           3       f
like image 90
Psidom Avatar answered Sep 27 '22 21:09

Psidom