I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
I want to read this column as String, but even if I specifiy it with
df=pd.read_csv(filename,dtype={'ID': str})
I get
ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00
Is there an easy way get the ID as a string without decimal like '12345'
without having to edit the Strings after importing the table?
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str}, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str})
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With