I have a dataframe consisting of two columns, Age and Salary
Age   Salary
21    25000
22    30000
22    Fresher
23    2,50,000
24    25 LPA
35    400000
45    10,00,000
How to handle outliers in Salary column and replace them with an integer?
If need replace non numeric values use to_numeric with parameter errors='coerce':
df['new'] = pd.to_numeric(df.Salary.astype(str).str.replace(',',''), errors='coerce')
              .fillna(0)
              .astype(int)
print (df)
   Age     Salary      new
0   21      25000    25000
1   22      30000    30000
2   22    Fresher        0
3   23   2,50,000   250000
4   24     25 LPA        0
5   35     400000   400000
6   45  10,00,000  1000000
                        Use numpy where to find non digit value, replace with '0'.
df['New']=df.Salary.apply(lambda x: np.where(x.isdigit(),x,'0'))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With