Pandas Dataframe interpreting columns as float instead of String

Question

I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.

   ID      xyz
0  12345     4.56
1           45.60
2  54231   987.00

I want to read this column as String, but even if I specifiy it with

df=pd.read_csv(filename,dtype={'ID': str})

I get

   ID         xyz
0  '12345.0'    4.56
1   NaN        45.60
2  '54231.0'  987.00

Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?

Joe · Accepted Answer

A solution could be this, but after you have imported the df:

df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)

Or since there are NaN with:

df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))

jezrael · Answer

Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:

import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str}, keep_default_na=False)
    print (df)
      ID     xyz
0  12345    4.56
1          45.60
2  54231  987.00

EDIT:

For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:

import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str})
print (df)
      ID     xyz
0  12345    4.56
1    NaN   45.60
2  54231  987.00

Pandas Dataframe interpreting columns as float instead of String

Tags:

python

types

python-3.x

pandas

dataframe

Georg B

2 Answers

Joe

jezrael

Recent Activity

Donate For Us

Pandas Dataframe interpreting columns as float instead of String

Tags:

python

types

python-3.x

pandas

dataframe

Georg B

2 Answers

Joe

jezrael

Related questions

Recent Activity

Donate For Us