Using DataFrame. You can convert the column “Fee” to a string by simply using DataFrame. apply(str) , for example df["Fee"]=df["Fee"].
Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype . Make conscious decisions about how to manage missing data.
We can use the parse_dates parameter to convince pandas to turn things into real datetime types. parse_dates takes a list of columns (since you could want to parse multiple columns into datetimes ).
Just want to reiterate this will work in pandas >= 0.9.1:
In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166
I'm creating an issue about detecting integer overflows also.
EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247
Update as it helps others:
To have all columns as str, one can do this (from the comment):
pd.read_csv('sample.csv', dtype = str)
To have most or selective columns as str, one can do this:
# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)
This probably isn't the most elegant way to do it, but it gets the job done.
In[1]: import numpy as np
In[2]: import pandas as pd
In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])
In[4]: df
Out[4]:
ID
0 00013007854817840016671868
1 00013007854817840016749251
2 00013007854817840016754630
3 00013007854817840016781876
4 00013007854817840017028824
5 00013007854817840017963235
6 00013007854817840018860166
Just replace '/Users/spencerlyon2/Desktop/test.csv'
with the path to your file
Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':
pd.read_csv('sample.csv',dtype={'ID':'string'})
As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With