Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read excel: do not parse numbers

I am working with python pandas and MS excel to edit a xlsx file. I iterate between these programs back and forth. The file contains some columns with text that looks like numbers, e.g.,

enter image description here

If I read this, I get

pd.read_excel ('test.xlsx')
     A
0    1
1  100

and

pd.read_excel ('test.xlsx').dtypes
A    int64
dtype: object

My question is: how is it possible to read the text as text? It is not an option to parse it back after reading, because part of the information (i.e., the leading zeros) is lost upon conversion to a number.

Thank you for your help.

like image 674
Felix Avatar asked Jul 01 '14 11:07

Felix


2 Answers

You can work around the known issue (assuming that you know the column name) by using the 'converters' parameter:

>>> pd.read_excel('test.xlsx', converters={'A': str})
     A
0  001
1  100
>>> pd.read_excel('test.xlsx', converters={'A': str}).dtypes
A    object
dtype: object
like image 63
D Read Avatar answered Oct 26 '22 11:10

D Read


According to this issue, it's a known problem with pandas.

like image 20
RJT Avatar answered Oct 26 '22 09:10

RJT