Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent pandas from reading None as Nan

I have cleaned a dataset and had to replace a lot of NaN values with None. After that I saved it to a new csv file, when I read the cleaned dataset back using pandas.read_csv, all the None values are represented as NaN, how can I avoid this?

like image 460
Effective_cellist Avatar asked Feb 03 '17 15:02

Effective_cellist


People also ask

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.

Why am I getting NaN in pandas?

In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.


1 Answers

You can use parameter keep_default_na and na_values in read_csv and then replace strings None to values None:

import pandas as pd
from pandas.compat import StringIO

temp=u"""a,b
None,NaN
a,8"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),keep_default_na=False,na_values=['NaN'])

print (df)
      a    b
0  None  NaN
1     a  8.0

print (type(df.a.iloc[0]))
<class 'str'>

df = df.replace({'None':None})
print (df)
      a    b
0  None  NaN
1     a  8.0

print (type(df.a.iloc[0]))
<class 'NoneType'>
like image 111
jezrael Avatar answered Sep 20 '22 00:09

jezrael