The following is example of data I have in excel sheet.
A B C
1 2 3
4 5 6
I am trying to get the columns name using the following code:
p1 = list(df1t.columns.values)
the output is like this
[A, B, C, 'Unnamed: 3', 'unnamed 4', 'unnamed 5', .....]
I checked the excel sheet, there is only three columns named A, B, and C. Other columns are blank. Any suggestion?
Just in case anybody stumbles over this problem: The issue can also arise if the excel sheet contains empty cells that are formatted with a background color:
import pandas as pd
df1t = pd.read_excel('test.xlsx')
print(df1t)
A B C Unnamed: 3
0 1 2 3 NaN
1 4 5 6 NaN
One option is to drop the 'Unnamed' columns as described here:
https://stackoverflow.com/a/44272830/11826257
df1t = df1t[df1t.columns.drop(list(df1t.filter(regex='Unnamed:')))]
print(df1t)
A B C
0 1 2 3
1 4 5 6
There is problem some cells are not empty but contains some whitespaces.
If need columns names with filtering Unnamed
:
cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']
Sample with file:
df = pd.read_excel('https://dl.dropboxusercontent.com/u/84444599/file_unnamed_cols.xlsx')
print (df)
A B C Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7
0 4.0 6.0 8.0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']
Another solution:
cols = df.columns[~df.columns.str.startswith('Unnamed:')]
print (cols)
Index(['A', 'B', 'C'], dtype='object')
And for return all columns by cols use:
print (df[cols])
A B C
0 4.0 6.0 8.0
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
And if necessary remove all NaN
s rows:
print (df[cols].dropna(how='all'))
A B C
0 4.0 6.0 8.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With