Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

column with mixed decimals

Tags:

python

pandas

I have a value column with all sorts of problems it returns an object dtype and it goes like this :

import pandas as pd

data = {'Observation':['1', '2', '3', '4', '5', '6', '7', '8', '9'], 
        'Value':['5.6' , '1 068,0', '1 956.3', '', 'NaN', '65,0', 'nan', '23,5', '30.0']} 
  
df = pd.DataFrame(data)

the main issues are :

  • mixed decimals (point and comma)
  • thousand separator in both types
  • mixed NaN and nan and the empty string for na values
  • point 0 or comma 0 at the end

the desired outcome is a column with float64 datatype with correct NaN values

thank you!

like image 813
is147 Avatar asked Nov 24 '25 15:11

is147


2 Answers

Pandas series.replace accepts a dictionary and regex option:

df['Value'] = pd.to_numeric(df['Value'].replace({' ':'', ',':'.'}, regex=True), 
                            errors='coerce')

Output:

  Observation   Value
0           1     5.6
1           2  1068.0
2           3  1956.3
3           4     NaN
4           5     NaN
5           6    65.0
6           7     NaN
7           8    23.5
8           9    30.0
like image 167
Quang Hoang Avatar answered Nov 27 '25 04:11

Quang Hoang


You can just do:

 df['Value'] = pd.to_numeric(df['Value'].str.replace(',', '.').str.replace(' ', ''), 
                             errors='coerce')

Output:

    Observation   Value
0             1     5.6
1             2  1068.0
2             3  1956.3
3             4     NaN
4             5     NaN
5             6    65.0
6             7     NaN
7             8    23.5
8             9    30.0
like image 45
Arkadiusz Avatar answered Nov 27 '25 05:11

Arkadiusz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!