I would like to convert all the values in a pandas dataframe from strings to floats. My dataframe contains various NaN values (e.g. NaN, NA, None). For example,
import pandas as pd
import numpy as np
my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)
I have found here and here (among other places) that convert_objects might be the way to go. However, I get a message that it is deprecated (I am using Pandas 0.17.1) and should instead use to_numeric.
df2 = df.convert_objects(convert_numeric=True)
Output:
FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
But to_numeric doesn't seem to actually convert the strings.
df3 = pd.to_numeric(df, errors='force')
Output:
df2:
0 1 2
0 0.5 0.20 0.1
1 NaN 0.45 0.2
2 0.9 0.02 NaN
df2 dtypes:
0 float64
1 float64
2 float64
dtype: object
df3:
0 1 2
0 0.5 0.2 0.1
1 NA 0.45 0.2
2 0.9 0.02 N/A
df3 dtypes:
0 object
1 object
2 object
dtype: object
Should I use convert_objects and deal with the warning message, or is there a proper way to do what I want with to_numeric?
Strangely this works:
In [11]:
df.apply(lambda x: pd.to_numeric(x, errors='force'))
Out[11]:
0 1 2
0 0.5 0.20 0.1
1 NaN 0.45 0.2
2 0.9 0.02 NaN
It seems that it's not able to coerce the entire df for some reason which is a little surprising
If you hate typing (thanks to @Zero) then you can just use:
df.apply(pd.to_numeric, errors='force')
You can try replace
and astype
:
import pandas as pd
import numpy as np
my_data = np.array([[0.5, 0.2, 0.1], ["NA", 0.45, 0.2], [0.9, 0.02, "N/A"]])
df = pd.DataFrame(my_data, dtype=str)
print df.replace({r'N': np.nan}, regex=True).astype(float)
0 1 2
0 0.5 0.20 0.1
1 NaN 0.45 0.2
2 0.9 0.02 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With