Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying data type in Pandas csv reader

Tags:

python

pandas

I am just getting started with Pandas and I am reading in a csv file using the read_csv() method. The difficulty I am having is preventing pandas from converting my telephone numbers to large numbers, instead of keeping them as strings. I defined a converter which just left the numbers alone, but then they still converted to numbers. When I changed my converter to prepend a 'z' to the phone numbers, then they stayed strings. Is there some way to keep them strings without modifying the values of the fields?

like image 389
Gardner Avatar asked May 14 '12 21:05

Gardner


People also ask

How do you specify Dtype in pandas?

Cast a pandas object to a specified dtype dtype . Use a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.

What is Dtype in read_csv?

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats. Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.


2 Answers

Since Pandas 0.11.0 you can use dtype argument to explicitly specify data type for each column:

d = pandas.read_csv('foo.csv', dtype={'BAR': 'S10'}) 
like image 142
zero323 Avatar answered Oct 09 '22 08:10

zero323


It looks like you can't avoid pandas from trying to convert numeric/boolean values in the CSV file. Take a look at the source code of pandas for the IO parsers, in particular functions _convert_to_ndarrays, and _convert_types. https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py

You can always assign the type you want after you have read the file:

df.phone = df.phone.astype(str) 
like image 45
lbolla Avatar answered Oct 09 '22 08:10

lbolla