I have a pandas datagframe created from a csv file. One column of this dataframe contains numeric data that is initially cast as a string. Most entries are numeric-like, but some contain various error codes that are non-numeric. I do not know beforehand what all the error codes might be or how many there are. So, for instance, the dataframe might look like:
[In 1]: df
[Out 1]:
data OtherAttr
MyIndex
0 1.4 aaa
1 error1 foo
2 2.2 bar
3 0.8 bar
4 xxx bbb
...
743733 BadData ccc
743734 7.1 foo
I want to cast df.data
as a float and throw out any values that don't convert properly. Is there a built-in functionality for this? Something like:
df.data = df.data.astype(float, skipbad = True)
(Although I know that specifically will not work and I don't see any kwargs within astype that do what I want)
I guess I could write a function using try
and then use pandas apply
or map
, but that seems like an inelegant solution. This must be a fairly common problem, right?
Use the convert_objects
method which "attempts to infer better dtype for object columns":
In [11]: df['data'].convert_objects(convert_numeric=True)
Out[11]:
0 1.4
1 NaN
2 2.2
3 0.8
4 NaN
Name: data, dtype: float64
In fact, you can apply this to the entire DataFrame:
In [12]: df.convert_objects(convert_numeric=True)
Out[12]:
data OtherAttr
MyIndex
0 1.4 aaa
1 NaN foo
2 2.2 bar
3 0.8 bar
4 NaN bbb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With