Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why pd.to_numeric `errors=''` is equivalent to `errors='coerce'`

I am on python 3.7 and pandas 0.24.2

Setup:

s = pd.Series(['10', '12', '15', '20', 'A', '31', 'C', 'D'])

In [36]: s
Out[36]:
0    10
1    12
2    15
3    20
4     A
5    31
6     C
7     D
dtype: object

to_numeric with errors='coerce'

pd.to_numeric(s, errors='coerce')

Out[37]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

to_numeric with errors='' (empty string)

pd.to_numeric(s, errors='')

Out[38]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

to_numeric with errors='ljljalklag'. I.e, random strings

pd.to_numeric(s, errors='ljljalklag')

Out[39]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

In other words, passing any string except strings raise, ignore to errors parameter of pd.to_numeric is equivalent to errors='coerce' .

Is this a feature or bugs?

like image 310
Andy L. Avatar asked Jul 31 '19 08:07

Andy L.


2 Answers

This has been fixed in version 0.25.0 to validate the errors keyword (see #26394).

New behavior in 0.25.0:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0'

In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
---------------------------------------------------------------------------
ValueError: invalid error value specified

Previous behavior in 0.24.2:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.2'

In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
Out[2]: array([1. , nan, 2.2])
like image 70
root Avatar answered Sep 16 '22 16:09

root


AFAIK, this is intended behavior, given that the source code:

# pandas/core/tools/numeric.py
... 
coerce_numeric = errors not in ("ignore", "raise") # line 147
...

So it is only checking if errors is either raise or ignore, and otherwise coerce as a default.

like image 38
Chris Avatar answered Sep 18 '22 16:09

Chris