Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

force pandas to read nan as string

Tags:

python

pandas

I could not find any other question related to mine. Please help me with a link, if I missed it...

I have a csv-file looking like this:

"concentration"
"5"
"5"
"5"
"5"
"5"

"nan"
"nan"
"nan"
"nan"
"nan"

If I read it with pandas read_csv, the "nan" values are automatically interpreted as NaN. But I would like to keep them as a string. The missing value which should be NaN is in line 7 (where actually nothing is written).

I tried to read it like this:

df = pd.read_csv(path, dtype= {'concentration': 'string'}, quoting = csv.QUOTE_NONNUMERIC, sep=',')

Can anybody help?

like image 824
Antje Janosch Avatar asked Nov 27 '14 14:11

Antje Janosch


1 Answers

Looks like you can use keep_default_na and na_values. From the docs:

na_values : list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values

keep_default_na : bool, default True
If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to

So here's the code

pd.read_csv('c:\\temp\\temp.txt', keep_default_na=False, na_values=[''])

   concentration
0              5
1              5
2              5
3              5
4              5
5            NaN
6            nan
7            nan
8            nan
9            nan
10           nan
like image 130
Roman Pekar Avatar answered Sep 20 '22 02:09

Roman Pekar