I am just getting started with Pandas and I am reading in a csv file using the read_csv()
method. The difficulty I am having is preventing pandas from converting my telephone numbers to large numbers, instead of keeping them as strings. I defined a converter which just left the numbers alone, but then they still converted to numbers. When I changed my converter to prepend a 'z' to the phone numbers, then they stayed strings. Is there some way to keep them strings without modifying the values of the fields?
Cast a pandas object to a specified dtype dtype . Use a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.
There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats. Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.
Since Pandas 0.11.0 you can use dtype argument to explicitly specify data type for each column:
d = pandas.read_csv('foo.csv', dtype={'BAR': 'S10'})
It looks like you can't avoid pandas from trying to convert numeric/boolean values in the CSV file. Take a look at the source code of pandas for the IO parsers, in particular functions _convert_to_ndarrays
, and _convert_types
. https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py
You can always assign the type you want after you have read the file:
df.phone = df.phone.astype(str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With