I have a dataframe consisting of two columns, Age and Salary
Age Salary
21 25000
22 30000
22 Fresher
23 2,50,000
24 25 LPA
35 400000
45 10,00,000
How to handle outliers in Salary column and replace them with an integer?
If need replace non numeric values use to_numeric
with parameter errors='coerce'
:
df['new'] = pd.to_numeric(df.Salary.astype(str).str.replace(',',''), errors='coerce')
.fillna(0)
.astype(int)
print (df)
Age Salary new
0 21 25000 25000
1 22 30000 30000
2 22 Fresher 0
3 23 2,50,000 250000
4 24 25 LPA 0
5 35 400000 400000
6 45 10,00,000 1000000
Use numpy where to find non digit value, replace with '0'.
df['New']=df.Salary.apply(lambda x: np.where(x.isdigit(),x,'0'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With