Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging duplicate columns while reading CSV file

Tags:

python

pandas

Given a CSV file with duplicate column A, I need to read the file excluding the duplicate column -

 A       A       C
306     306     506
3238    3238    591
4159    4159    366
1847    1847    2898

Available alternative options include usecols, and names. However, in Pandas version 0.24.1 we have mangle_dupe_cols parameter too, which if set to False should merge duplicate columns as mentioned in the docs.

But, when I do so, I get ValueError-

pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head()
ValueError: Setting mangle_dupe_cols=False is not supported yet

Pandas version used for this problem - 0.24.1

What are your views on this problem?

like image 871
meW Avatar asked Mar 04 '19 07:03

meW


People also ask

How do I merge two columns in the same csv file?

Use the CONCATENATE function: Click Text functions and select CONCATENATE. Enter A1 in the text1 field, B1 in the text2 field, and C1 in the text3 field. Click OK. The columns are combined.

How do I get rid of duplicate columns after Merge Pandas?

merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.

How do I make two columns into a csv file?

csv file is a text file; you can add lines (rows) to an existing file. To add columns you need to write a whole new file. That means load the file into Python object (dataframe), make changes there, and then write the new file. Please provide a sample file/data to give you working code and avoid posting images of data.

How do I find duplicates in a csv file?

Find Duplicates To find duplicate values in a column, click the column header and select Histogram. This will count how many many times each value appears in the dataset. You can search the Histogram for values that show up more than once.


1 Answers

I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv().

Unfortunately answer for comment is this comment:

What is the ETA on this issue?

when / if a community pull request happens

One possible solution is read file twice:

c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0]
#or
#with open('some.csv', newline='') as f:
#  reader = csv.reader(f)
#  c = next(reader)

df = pd.read_csv('some.csv', header=None, skiprows=1)
df.columns = c
like image 119
jezrael Avatar answered Oct 13 '22 21:10

jezrael