Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import pandas dataframe column as string not int

People also ask

Can we convert DataFrame to string in Python?

Using DataFrame. You can convert the column “Fee” to a string by simply using DataFrame. apply(str) , for example df["Fee"]=df["Fee"].

Can a pandas column have different data types?

Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype . Make conscious decisions about how to manage missing data.

What does Parse_dates in pandas do?

We can use the parse_dates parameter to convince pandas to turn things into real datetime types. parse_dates takes a list of columns (since you could want to parse multiple columns into datetimes ).


Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]: 
                           ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

This probably isn't the most elegant way to do it, but it gets the job done.

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])

In[4]: df
Out[4]: 
                       ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file


Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').