In my application, I receive a pandas DataFrame (say, block
), that has a column called est
. This column can contain a mix of strings or floats. I need to convert all values in the column to floats and have the column type be float64
. I do so using the following code:
block[est].convert_objects(convert_numeric=True)
block[est].astype('float')
This works for most cases. However, in one case, est
contains all empty strings. In this case, the first statement executes without error, but the empty strings in the column remain empty strings. The second statement then causes an error: ValueError: could not convert string to float:
.
How can I modify my code to handle a column with all empty strings?
Edit: I know I can just do block[est].replace("", np.NaN)
, but I was wondering if there's some way to do it with just convert_objects
or astype
that I'm missing.
Clarification: For project-specific reasons, I need to use pandas 0.16.2.
Here's an interaction with some sample data that demonstrates the failure:
>>> block = pd.DataFrame({"eps":["", ""]})
>>> block = block.convert_objects(convert_numeric=True)
>>> block["eps"]
0
1
Name: eps, dtype: object
>>> block["eps"].astype('float')
...
ValueError: could not convert string to float:
It's easier to do it using:
pandas.to_numeric
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html
import pandas as pd
df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})
df['eps'] = pd.to_numeric(df['eps'], errors='coerce')
'coerce' will convert any value error to NaN
df['eps'].astype('float')
0 1.0
1 1.6
2 1.6
3 NaN
4 NaN
5 NaN
Name: eps, dtype: float64
Then you can apply other functions without getting errors :
df['eps'].round()
0 1.0
1 2.0
2 2.0
3 NaN
4 NaN
5 NaN
Name: eps, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With