I have cleaned a dataset and had to replace a lot of NaN
values with None
. After that I saved it to a new csv file, when I read the cleaned dataset back using pandas.read_csv
, all the None
values are represented as NaN
, how can I avoid this?
This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.
In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.
You can use parameter keep_default_na
and na_values
in read_csv
and then replace
strings None
to values None
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""a,b
None,NaN
a,8"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),keep_default_na=False,na_values=['NaN'])
print (df)
a b
0 None NaN
1 a 8.0
print (type(df.a.iloc[0]))
<class 'str'>
df = df.replace({'None':None})
print (df)
a b
0 None NaN
1 a 8.0
print (type(df.a.iloc[0]))
<class 'NoneType'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With