How can I merge two pandas DataFrames on two columns with different names and keep one of the columns? <pre class="prettyprint"><code>df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']}) df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']}) pd.merge(df1, df2, left_on='UserName', right_on='UserID') </code></pre> This provides a DataFrame like this <img src="https://i.stack.imgur.com/VxTe3.png" alt="enter image description here"> But clearly I am merging on <code>UserName</code> and <code>UserID</code> so they are the same. I want it to look like this. Is there any clean ways to do this? <img src="https://i.stack.imgur.com/b6RmG.png" alt="enter image description here"> Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like <pre class="prettyprint"><code>pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left') </code></pre>

How about set the <code>UserID</code> as index and then join on index for the second data frame? <pre class="prettyprint"><code>pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True) # Col1 UserName Col2 # 0 a 1 d # 1 b 2 e # 2 c 3 f </code></pre>

pandas merge on columns with different names and avoid duplicates [duplicate]

Tags:

python

merge

pandas

How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?

df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')

This provides a DataFrame like this

enter image description here

But clearly I am merging on UserName and UserID so they are the same. I want it to look like this. Is there any clean ways to do this?

enter image description here

Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like

pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')

779

asked Oct 11 '16 20:10

E.K.

1 Answers

How about set the UserID as index and then join on index for the second data frame?

pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)

#   Col1    UserName    Col2
# 0    a           1       d
# 1    b           2       e
# 2    c           3       f

answered Sep 27 '22 21:09

Psidom

Related questions
                            
                                pretty output with pyyaml
                            
                                How can I produce a numpy-like documentation?
                            
                                How to use setuptools to install in a custom directory?
                            
                                minimum and argmin in numpy
                            
                                Python does not create log file
                            
                                Easiest way to create a color gradient on excel using python/pandas?
                            
                                How to sort a list by last character of string
                            
                                What difference between subprocess.call() and subprocess.Popen() makes PIPE less secure for the former?
                            
                                What are uwsgi threads used for?
                            
                                pygame clock.tick() vs framerate in game main loop
                            
                                Why do different methods of same object have the same `id`? [duplicate]
                            
                                Pandas: convert column with empty strings to float
                            
                                Django saving json value to database/model
                            
                                sklearn pipeline - Applying sample weights after applying a polynomial feature transformation in a pipeline
                            
                                Plotly in Jupyter issue
                            
                                Saving a Pandas DataFrame to a Django Model
                            
                                Pandas: plot multiple time series DataFrame into a single plot
                            
                                write() argument must be str, not bytes [duplicate]
                            
                                Python Pandas prevent line break in cell
                            
                                How to get randomly select n elements from a list using in numpy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With