Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there a method to skip unconvertible rows when casting a pandas series from str to float?

Tags:

python

pandas

I have a pandas datagframe created from a csv file. One column of this dataframe contains numeric data that is initially cast as a string. Most entries are numeric-like, but some contain various error codes that are non-numeric. I do not know beforehand what all the error codes might be or how many there are. So, for instance, the dataframe might look like:

[In 1]: df
[Out 1]:
            data     OtherAttr
MyIndex
0           1.4        aaa
1           error1     foo
2           2.2        bar
3           0.8        bar
4           xxx        bbb
...
743733      BadData    ccc
743734      7.1        foo

I want to cast df.data as a float and throw out any values that don't convert properly. Is there a built-in functionality for this? Something like:

df.data = df.data.astype(float, skipbad = True)

(Although I know that specifically will not work and I don't see any kwargs within astype that do what I want)

I guess I could write a function using try and then use pandas apply or map, but that seems like an inelegant solution. This must be a fairly common problem, right?

like image 854
user2543645 Avatar asked Aug 21 '13 21:08

user2543645


1 Answers

Use the convert_objects method which "attempts to infer better dtype for object columns":

In [11]: df['data'].convert_objects(convert_numeric=True)
Out[11]: 
0    1.4
1    NaN
2    2.2
3    0.8
4    NaN
Name: data, dtype: float64

In fact, you can apply this to the entire DataFrame:

In [12]: df.convert_objects(convert_numeric=True)
Out[12]: 
         data OtherAttr
MyIndex                
0         1.4       aaa
1         NaN       foo
2         2.2       bar
3         0.8       bar
4         NaN       bbb
like image 134
Andy Hayden Avatar answered Oct 17 '22 08:10

Andy Hayden