Method 1: Use the index = False argument But you should also include index = False argument. It will automatically drop the unnamed column in pandas. And if you want to set the index for the dataframe then you can call the df.
if index_col is 0, in this case it means that "1" will be the index of the first column, "2" will be the index for the second column and so on.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
It's the index column, pass pd.to_csv(..., index=False)
to not write out an unnamed index column in the first place, see the to_csv()
docs.
Example:
In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))
Out[37]:
Unnamed: 0 a b c
0 0 0.109066 -1.112704 -0.545209
1 1 0.447114 1.525341 0.317252
2 2 0.507495 0.137863 0.886283
3 3 1.452867 1.888363 1.168101
4 4 0.901371 -0.704805 0.088335
compare with:
In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))
Out[38]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
You could also optionally tell read_csv
that the first column is the index column by passing index_col=0
:
In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)
Out[40]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
This is usually caused by your CSV having been saved along with an (unnamed) index (RangeIndex
).
(The fix would actually need to be done when saving the DataFrame, but this isn't always an option.)
read_csv
with index_col=[0]
argumentIMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0]
argument to pd.read_csv
, this reads in the first column as the index. (Note the square brackets).
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
# Save DataFrame to CSV.
df.to_csv('file.csv')
<!- ->
pd.read_csv('file.csv')
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
Note
You could have avoided this in the first place by usingindex=False
if the output CSV was created in pandas, if your DataFrame does not have an index to begin with:df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.
str.match
If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match
:
df
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')
df.columns.str.match('Unnamed')
# array([ True, False, False, False])
df.loc[:, ~df.columns.str.match('Unnamed')]
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)
Another case that this might be happening is if your data was improperly written to your csv
to have each row end with a comma. This will leave you with an unnamed column Unnamed: x
at the end of your data when you try to read it into a df
.
You can do the following with Unnamed Columns:
file.csv
,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9
#read file
df = pd.read_csv('file.csv')
Method 1: Delete Unnamed Columns
# delete one by one like column is 'Unnamed: 0' so use it's name
df.drop('Unnamed: 0', axis=1, inplace=True)
#delete all Unnamed Columns in a single code of line using regex
df.drop(df.filter(regex="Unnamed"),axis=1, inplace=True)
Method 2: Rename Unnamed Columns
df.rename(columns = {'Unnamed: 0':'Name'}, inplace = True)
If you want to write out with a blank header as in the input file, just choose 'Name' above to be ''.
Simply delete that column using: del df['column_name']
Simple do this:
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With