I am on python 3.7 and pandas 0.24.2
Setup:
s = pd.Series(['10', '12', '15', '20', 'A', '31', 'C', 'D'])
In [36]: s
Out[36]:
0 10
1 12
2 15
3 20
4 A
5 31
6 C
7 D
dtype: object
to_numeric with errors='coerce'
pd.to_numeric(s, errors='coerce')
Out[37]:
0 10.0
1 12.0
2 15.0
3 20.0
4 NaN
5 31.0
6 NaN
7 NaN
dtype: float64
to_numeric with errors=''
(empty string)
pd.to_numeric(s, errors='')
Out[38]:
0 10.0
1 12.0
2 15.0
3 20.0
4 NaN
5 31.0
6 NaN
7 NaN
dtype: float64
to_numeric with errors='ljljalklag'
. I.e, random strings
pd.to_numeric(s, errors='ljljalklag')
Out[39]:
0 10.0
1 12.0
2 15.0
3 20.0
4 NaN
5 31.0
6 NaN
7 NaN
dtype: float64
In other words, passing any string except strings raise
, ignore
to errors
parameter of pd.to_numeric
is equivalent to errors='coerce'
.
Is this a feature or bugs?
This has been fixed in version 0.25.0 to validate the errors
keyword (see #26394).
New behavior in 0.25.0:
In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0'
In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
---------------------------------------------------------------------------
ValueError: invalid error value specified
Previous behavior in 0.24.2:
In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.2'
In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
Out[2]: array([1. , nan, 2.2])
AFAIK, this is intended behavior, given that the source code:
# pandas/core/tools/numeric.py
...
coerce_numeric = errors not in ("ignore", "raise") # line 147
...
So it is only checking if errors
is either raise
or ignore
, and otherwise coerce
as a default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With