I have a Pandas dataframe with a single column of strings. I want to convert the column data to float. Some of the values cannot be converted to float due to their format. I want to omit these "illegal strings" from the result and only extract values that can be legally re-cast as floats. The starting data:
test=pd.DataFrame()
test.loc[0,'Value']='<3'
test.loc[1,'Value']='10'
test.loc[2,'Value']='Detected'
test.loc[3,'Value']=''
The desired output contains only strings that could be re-cast as floats (in this case, 10):
cleanDF=test['Value'].astype(float)
cleanDF
0 10
Name: Value, dtype: float64
Of course, this throws an error as expected on the illegal string for float conversion:
ValueError: could not convert string to float: <3
Is there a simple way to solve this if the dataframe is large and contains many illegal strings in 'Value'?
Thanks.
You could try using DataFrame's apply
. Write a function that includes an exception handler and apply it to the DataFrame.
def test_apply(x):
try:
return float(x)
except ValueError:
return None
cleanDF = test['Value'].apply(test_apply).dropna()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With