Given a CSV file with duplicate column <code>A</code>, I need to read the file excluding the duplicate column - <pre class="prettyprint"><code> A A C 306 306 506 3238 3238 591 4159 4159 366 1847 1847 2898 </code></pre> Available alternative options include <code>usecols</code>, and <code>names</code>. However, in <code>Pandas version 0.24.1</code> we have <code>mangle_dupe_cols</code> parameter too, which if set to <code>False</code> should merge duplicate columns as mentioned in the docs. But, when I do so, I get ValueError- <pre class="prettyprint"><code>pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head() ValueError: Setting mangle_dupe_cols=False is not supported yet </code></pre> Pandas version used for this problem - <code>0.24.1</code> What are your views on this problem?

I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv(). Unfortunately answer for comment is this comment: <blockquote> What is the ETA on this issue? when / if a community pull request happens </blockquote> One possible solution is read file twice: <pre class="prettyprint"><code>c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0] #or #with open('some.csv', newline='') as f: # reader = csv.reader(f) # c = next(reader) df = pd.read_csv('some.csv', header=None, skiprows=1) df.columns = c </code></pre>

Merging duplicate columns while reading CSV file

Tags:

python

pandas

Given a CSV file with duplicate column A, I need to read the file excluding the duplicate column -

 A       A       C
306     306     506
3238    3238    591
4159    4159    366
1847    1847    2898

Available alternative options include usecols, and names. However, in Pandas version 0.24.1 we have mangle_dupe_cols parameter too, which if set to False should merge duplicate columns as mentioned in the docs.

But, when I do so, I get ValueError-

pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head()
ValueError: Setting mangle_dupe_cols=False is not supported yet

Pandas version used for this problem - 0.24.1

What are your views on this problem?

871

asked Mar 04 '19 07:03

meW

1 Answers

I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv().

Unfortunately answer for comment is this comment:

What is the ETA on this issue?

when / if a community pull request happens

One possible solution is read file twice:

c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0]
#or
#with open('some.csv', newline='') as f:
#  reader = csv.reader(f)
#  c = next(reader)

df = pd.read_csv('some.csv', header=None, skiprows=1)
df.columns = c

119

answered Oct 13 '22 21:10

jezrael

Related questions
                            
                                Async in django rest framework
                            
                                Matplotlib bar chart - overlay bars similar to stacked
                            
                                Merge dataframes on multiple columns with fuzzy match in Python
                            
                                How to choose between writing compact but complicated code AND easy-to-follow but longer code in Python?
                            
                                Setting up simple SAFE http server in Python3
                            
                                Change .egg-info directory with pip install --editable
                            
                                Set maximum number of cores for Jupyter notebook
                            
                                ModuleNotFoundError for spyder-kernels module installed via pip
                            
                                How to specify multiple return types in a function docstring in Python?
                            
                                Extracting particular text associated value from an image
                            
                                Installing Anaconda while having Python 3.7 already installed
                            
                                How to avoid calling latex in matplotlib (output to pgf)
                            
                                Difference between using 'and' and using '&' in Django ORM
                            
                                Data file saved only temporarily when using Pyinstaller executable
                            
                                Container localhost does not exist error when using Keras + Flask Blueprints
                            
                                Transparent window with blur behind with pyqt
                            
                                Can I use a machine learning model as the objective function in an optimization problem?
                            
                                Equivalent Python code for mutate_if from tidyverse
                            
                                py2neo - The client is unauthorized due to authentication failure
                            
                                Why do two sub-processes stop each other from working?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With