Merging two pandas dataframes results in "duplicate" columns

Tags:

I'm trying to merge two dataframes which contain the same key column. Some of the other columns also have identical headers, although not an equal number of rows, and after merging these columns are "duplicated" with the original headers given a postscript _x, _y, etc.

Does anyone know how to get pandas to drop the duplicate columns in the example below?

This is my python code:

import pandas as pd

holding_df = pd.read_csv('holding.csv')
invest_df = pd.read_csv('invest.csv')

merge_df = pd.merge(holding_df, invest_df, on='key', how='left').fillna(0)
merge_df.to_csv('merged.csv', index=False)

And the CSV files contain this:

First rows of left-dataframe (holding_df)

key, dept_name, res_name, year, need, holding
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1
DeptA_ResA_2016, DeptA, ResA, 2016, 1, 1
DeptA_ResA_2017, DeptA, ResA, 2017, 1, 1
...

Right-dataframe (invest_df)

key, dept_name, res_name, year, no_of_inv, inv_cost_wo_ice
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1000000
DeptA_ResB_2015, DeptA, ResB, 2015, 2, 6000000
DeptB_ResB_2015, DeptB, ResB, 2015, 1, 6000000
...

Merged result

key, dept_name_x, res_name_x, year_x, need, holding, dept_name_y, res_name_y, year_y, no_of_inv, inv_cost_wo_ice
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1, DeptA, ResA, 2015.0, 1.0, 1000000.0
DeptA_ResA_2016, DeptA, ResA, 2016, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2017, DeptA, ResA, 2017, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2018, DeptA, ResA, 2018, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2019, DeptA, ResA, 2019, 1, 1, 0, 0, 0.0, 0.0, 0.0
...

217

asked Dec 05 '14 10:12

larslovlie

1 Answers

I have the same problem with duplicate columns after left joins even when the columns' data is identical. I did a query and found out that NaN values are considered different even if both columns are NaN in pandas 0.14. BUT once you upgrade to 0.15, this problem disappears, which explains why it later works for you, you probably upgraded.

answered Oct 11 '22 18:10

desmond

Related questions
                            
                                What are good features for classifying photos of clothing? [closed]
                            
                                How to access the meta attributes of a superclass in Python?
                            
                                What's the maximum number of repetitions allowed in a Python regex?
                            
                                Python - Efficient way to find the largest area of a specific value in a 2D numpy array
                            
                                Python searching a large list speed
                            
                                Setuptools console_script entry point not found with install but it's found with develop
                            
                                Managing pip in an RPM environment
                            
                                How to use Cleaner, lxml.html without returning div tag?
                            
                                How do I perform low level I/O on a Linux device file in Python?
                            
                                Decrypt Chrome Linux BLOB encrypted cookies in Python
                            
                                ArgParse Python Module: Change default argument value for inherted argument
                            
                                Setting numpoints in matplotlib legend does not work
                            
                                Matplotlib text bounding box dimensions
                            
                                Viewing Local Variables in Spyder's Variable Explorer
                            
                                What does 'yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'' mean?
                            
                                How to merge two dataframe in pandas to replace nan
                            
                                Communication between C++ and Python
                            
                                Bad Marshal error -- runsnake
                            
                                Make Python's `warnings.warn()` not mention itself
                            
                                ImportError: No module named lxml - Even though LXML Is installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merging two pandas dataframes results in "duplicate" columns

Tags:

python

pandas

larslovlie

People also ask

1 Answers

desmond

Recent Activity

Donate For Us