I have a value column with all sorts of problems it returns an object dtype and it goes like this :
import pandas as pd
data = {'Observation':['1', '2', '3', '4', '5', '6', '7', '8', '9'],
'Value':['5.6' , '1 068,0', '1 956.3', '', 'NaN', '65,0', 'nan', '23,5', '30.0']}
df = pd.DataFrame(data)
the main issues are :
the desired outcome is a column with float64 datatype with correct NaN values
thank you!
Pandas series.replace accepts a dictionary and regex option:
df['Value'] = pd.to_numeric(df['Value'].replace({' ':'', ',':'.'}, regex=True),
errors='coerce')
Output:
Observation Value
0 1 5.6
1 2 1068.0
2 3 1956.3
3 4 NaN
4 5 NaN
5 6 65.0
6 7 NaN
7 8 23.5
8 9 30.0
You can just do:
df['Value'] = pd.to_numeric(df['Value'].str.replace(',', '.').str.replace(' ', ''),
errors='coerce')
Output:
Observation Value
0 1 5.6
1 2 1068.0
2 3 1956.3
3 4 NaN
4 5 NaN
5 6 65.0
6 7 NaN
7 8 23.5
8 9 30.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With