I have a data file from columns A-G like below but when I am reading it with pd.read_csv('data.csv')
it prints an extra unnamed
column at the end for no reason.
colA ColB colC colD colE colF colG Unnamed: 7 44 45 26 26 40 26 46 NaN 47 16 38 47 48 22 37 NaN 19 28 36 18 40 18 46 NaN 50 14 12 33 12 44 23 NaN 39 47 16 42 33 48 38 NaN
I have seen my data file various times but I have no extra data in any other column. How I should remove this extra column while reading ? Thanks
The simplest solution would be to read the "Unnamed: 0" column as the index. So, what you have to do is to specify an index_col=[0] argument to read_csv() function, then it reads in the first column as the index.
Pandas, however, can be tricked into allowing duplicate column names. Duplicate column names are a problem if you plan to transfer your data set to another statistical language.
df = df.loc[:, ~df.columns.str.contains('^Unnamed')] In [162]: df Out[162]: colA ColB colC colD colE colF colG 0 44 45 26 26 40 26 46 1 47 16 38 47 48 22 37 2 19 28 36 18 40 18 46 3 50 14 12 33 12 44 23 4 39 47 16 42 33 48 38
if the first column in the CSV file has index values, then you can do this instead:
df = pd.read_csv('data.csv', index_col=0)
First, find the columns that have 'unnamed', then drop those columns. Note: You should Add inplace = True
to the .drop
parameters as well.
df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With