Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.DataFrame set all string values to nan

I have a pandas.DataFrame that contain string, float and int types.

Is there a way to set all strings that cannot be converted to float to NaN ?

For example:

    A  B   C      D
0   1  2   5      7
1   0  4 NaN     15
2   4  8   9     10
3  11  5   8      0
4  11  5   8  "wajdi"

to:

    A  B   C      D
0   1  2   5      7
1   0  4 NaN     15
2   4  8   9     10
3  11  5   8      0
4  11  5   8    NaN
like image 595
farhawa Avatar asked Aug 27 '16 18:08

farhawa


3 Answers

You can use pd.to_numeric and set errors='coerce'

pandas.to_numeric

df['D'] = pd.to_numeric(df.D, errors='coerce')

Which will give you:

    A   B   C   D
0   1   2   5.0 7.0
1   0   4   NaN 15.0
2   4   8   9.0 10.0
3   11  5   8.0 0.0
4   11  5   8.0 NaN

Deprecated solution (pandas <= 0.20 only):

df.convert_objects(convert_numeric=True)

pandas.DataFrame.convert_objects

Here's the dev note in the convert_objects source code: # TODO: Remove in 0.18 or 2017, which ever is sooner. So don't make this a long term solution if you use it.

like image 154
rwhitt2049 Avatar answered Oct 14 '22 06:10

rwhitt2049


Here is a way:

df['E'] = pd.to_numeric(df.D, errors='coerce')

And then you have:


    A  B    C      D     E
0   1  2  5.0      7   7.0
1   0  4  NaN     15  15.0
2   4  8  9.0     10  10.0
3  11  5  8.0      0   0.0
4  11  5  8.0  wajdi   NaN
like image 42
Israel Unterman Avatar answered Oct 14 '22 07:10

Israel Unterman


You can use pd.to_numeric with errors='coerce'.

In [30]: df = pd.DataFrame({'a': [1, 2, 'NaN', 'bob', 3.2]})

In [31]: pd.to_numeric(df.a, errors='coerce')
Out[31]: 
0    1.0
1    2.0
2    NaN
3    NaN
4    3.2
Name: a, dtype: float64

Here is one way to apply it to all columns:

for c in df.columns:
    df[c] = pd.to_numeric(df[c], errors='coerce')

(See comment by NinjaPuppy for a better way.)

like image 23
Ami Tavory Avatar answered Oct 14 '22 06:10

Ami Tavory