Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: how to specify data types when reading an Excel file?

I am importing an excel file into a pandas dataframe with the pandas.read_excel() function.

One of the columns is the primary key of the table: it's all numbers, but it's stored as text (the little green triangle in the top left of the Excel cells confirms this).

However, when I import the file into a pandas dataframe, the column gets imported as a float. This means that, for example, '0614' becomes 614.

Is there a way to specify the datatype when importing a column? I understand this is possible when importing CSV files but couldn't find anything in the syntax of read_excel().

The only solution I can think of is to add an arbitrary letter at the beginning of the text (converting '0614' into 'A0614') in Excel, to make sure the column is imported as text, and then chopping off the 'A' in python, so I can match it to other tables I am importing from SQL.

like image 959
Pythonista anonymous Avatar asked Sep 15 '15 16:09

Pythonista anonymous


People also ask

How do you read data from Excel file in Python using pandas?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

How does pandas determine data type?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

Can pandas series hold different data types?

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).


1 Answers

You just specify converters. I created an excel spreadsheet of the following structure:

names   ages bob     05 tom     4 suzy    3 

Where the "ages" column is formatted as strings. To load:

import pandas as pd  df = pd.read_excel('Book1.xlsx',sheetname='Sheet1',header=0,converters={'names':str,'ages':str}) >>> df        names ages    0   bob   05    1   tom   4    2   suzy  3 
like image 162
tnknepp Avatar answered Sep 26 '22 23:09

tnknepp