I love pandas, but I am having real problems with Unicode errors. read_excel() returns the dreaded Unicode error:
import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)
I figured out that the original Excel had (non-breaking space) at the end of many cells, probably to avoid conversion of long digit strings to float.
One way around this is to strip the cells, but there must be something better.
for col in df.columns:
df[col]=df[col].str.strip()
I am using anaconda2.2.0 win64, with pandas 0.16
The read_excel() method from the pandas library reads excel files, that is, files in the . xls format. It takes the file name or directory as the first argument with the sheet name as the second argument value. As a matter of course, it takes an excel file as input and returns it as a DataFrame.
We can use the pandas module read_excel() function to read the excel file data into a DataFrame object. If you look at an excel sheet, it's a two-dimensional table. The DataFrame object also represents a two-dimensional tabular data structure.
Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL.
Try this method suggested here:
df=pd.read_excel('tmp.xlsx',encoding=sys.getfilesystemencoding())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With