Given a CSV file with duplicate column A
, I need to read the file excluding the duplicate column -
A A C
306 306 506
3238 3238 591
4159 4159 366
1847 1847 2898
Available alternative options include usecols
, and names
. However, in Pandas version 0.24.1
we have mangle_dupe_cols
parameter too, which if set to False
should merge duplicate columns as mentioned in the docs.
But, when I do so, I get ValueError-
pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head()
ValueError: Setting mangle_dupe_cols=False is not supported yet
Pandas version used for this problem - 0.24.1
What are your views on this problem?
Use the CONCATENATE function: Click Text functions and select CONCATENATE. Enter A1 in the text1 field, B1 in the text2 field, and C1 in the text3 field. Click OK. The columns are combined.
merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.
csv file is a text file; you can add lines (rows) to an existing file. To add columns you need to write a whole new file. That means load the file into Python object (dataframe), make changes there, and then write the new file. Please provide a sample file/data to give you working code and avoid posting images of data.
Find Duplicates To find duplicate values in a column, click the column header and select Histogram. This will count how many many times each value appears in the dataset. You can search the Histogram for values that show up more than once.
I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv().
Unfortunately answer for comment is this comment:
What is the ETA on this issue?
when / if a community pull request happens
One possible solution is read file twice:
c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0]
#or
#with open('some.csv', newline='') as f:
# reader = csv.reader(f)
# c = next(reader)
df = pd.read_csv('some.csv', header=None, skiprows=1)
df.columns = c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With