Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas read_excel returns UnicodeDecodeError on describe()

I love pandas, but I am having real problems with Unicode errors. read_excel() returns the dreaded Unicode error:

import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)

I figured out that the original Excel had   (non-breaking space) at the end of many cells, probably to avoid conversion of long digit strings to float.

One way around this is to strip the cells, but there must be something better.

for col in df.columns:
    df[col]=df[col].str.strip()

I am using anaconda2.2.0 win64, with pandas 0.16

like image 695
hsinger Avatar asked Jun 10 '15 19:06

hsinger


People also ask

What does read_excel return in Python?

The read_excel() method from the pandas library reads excel files, that is, files in the . xls format. It takes the file name or directory as the first argument with the sheet name as the second argument value. As a matter of course, it takes an excel file as input and returns it as a DataFrame.

What does read_excel do in pandas?

We can use the pandas module read_excel() function to read the excel file data into a DataFrame object. If you look at an excel sheet, it's a two-dimensional table. The DataFrame object also represents a two-dimensional tabular data structure.

Can pandas read xlsx files?

Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL.


1 Answers

Try this method suggested here:

df=pd.read_excel('tmp.xlsx',encoding=sys.getfilesystemencoding())
like image 85
skytaker Avatar answered Oct 13 '22 00:10

skytaker