Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas returning the unnamed columns

Tags:

python

pandas

The following is example of data I have in excel sheet.

A    B   C 
1    2   3 
4    5   6

I am trying to get the columns name using the following code:

p1 = list(df1t.columns.values)

the output is like this

[A, B, C, 'Unnamed: 3', 'unnamed 4', 'unnamed 5', .....] 

I checked the excel sheet, there is only three columns named A, B, and C. Other columns are blank. Any suggestion?

like image 529
Mary Avatar asked Sep 10 '25 19:09

Mary


2 Answers

Just in case anybody stumbles over this problem: The issue can also arise if the excel sheet contains empty cells that are formatted with a background color:

Screenshot of Excel File

import pandas as pd
df1t = pd.read_excel('test.xlsx')
print(df1t)
   A  B  C  Unnamed: 3
0  1  2  3         NaN
1  4  5  6         NaN

One option is to drop the 'Unnamed' columns as described here:
https://stackoverflow.com/a/44272830/11826257

df1t = df1t[df1t.columns.drop(list(df1t.filter(regex='Unnamed:')))]
print(df1t)
   A  B  C
0  1  2  3
1  4  5  6
like image 101
Staehlo Avatar answered Sep 13 '25 09:09

Staehlo


There is problem some cells are not empty but contains some whitespaces.

If need columns names with filtering Unnamed:

cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']

Sample with file:

df = pd.read_excel('https://dl.dropboxusercontent.com/u/84444599/file_unnamed_cols.xlsx')
print (df)
     A    B    C Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7
0  4.0  6.0  8.0        NaN        NaN        NaN        NaN        NaN
1  NaN  NaN  NaN                   NaN        NaN        NaN        NaN
2  NaN  NaN  NaN        NaN                   NaN        NaN           
3  NaN  NaN  NaN        NaN        NaN                              NaN

cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']

Another solution:

cols = df.columns[~df.columns.str.startswith('Unnamed:')]
print (cols)
Index(['A', 'B', 'C'], dtype='object')

And for return all columns by cols use:

print (df[cols])
     A    B    C
0  4.0  6.0  8.0
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN

And if necessary remove all NaNs rows:

print (df[cols].dropna(how='all'))
     A    B    C
0  4.0  6.0  8.0
like image 36
jezrael Avatar answered Sep 13 '25 07:09

jezrael