Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas read_excel doesn't recognize null cell

My excel sheet:

   A   B  
1 first second
2
3 
4  x   y  
5  z   j

Python code:

df = pd.read_excel (filename, parse_cols=1)

return a correct output:

  first second
0 NaN   NaN
1 NaN   NaN
2 x     y
3 z     j

If i want work only with second column

df = pd.read_excel (filename, parse_cols=[1])

return:

 second
0  y
1  j

I'd have information about empty excel rows (NaN in my df) even if I work only with a specific column. If output loose NaN information it's not ok, for example, for skiprows paramater, etc

Thanks

like image 672
franco_b Avatar asked Sep 05 '16 16:09

franco_b


People also ask

What does read_excel do in pandas?

We can use the pandas module read_excel() function to read the excel file data into a DataFrame object. If you look at an excel sheet, it's a two-dimensional table. The DataFrame object also represents a two-dimensional tabular data structure.

What does read_excel return in Python?

read_excel() function is used to read excel sheet with extension xlsx into pandas DataFrame. By reading a single sheet it returns a pandas DataFrame object, but reading two sheets it returns a Dict of DataFrame.

IS NULL check in pandas?

In order to check null values in Pandas Dataframe, we use notnull() function this function return dataframe of Boolean values which are False for NaN values.

Why pandas Cannot read excel?

Pandas uses the xlrd as their default engine for reading excel files. However, xlrd has removed support for anything other than xls files in their latest release. This causes you to receive the error that the xlsx filetype is no longer supported when calling the read_excel function on a xlsx excel using pandas.


1 Answers

For me works parameter skip_blank_lines=False:

df = pd.read_excel ('test.xlsx', 
                     parse_cols=1, 
                     skip_blank_lines=False)
print (df)

       A       B
0  first  second
1    NaN     NaN
2    NaN     NaN
3      x       y
4      z       j

Or if need omit first row:

df = pd.read_excel ('test.xlsx', 
                     parse_cols=1, 
                     skiprows=1,
                     skip_blank_lines=False)
print (df)

  first second
0   NaN    NaN
1   NaN    NaN
2     x      y
3     z      j
like image 100
jezrael Avatar answered Sep 19 '22 22:09

jezrael