Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace non integer values in a pandas Dataframe?

I have a dataframe consisting of two columns, Age and Salary

Age   Salary
21    25000
22    30000
22    Fresher
23    2,50,000
24    25 LPA
35    400000
45    10,00,000

How to handle outliers in Salary column and replace them with an integer?

like image 723
yondu_udanta Avatar asked Mar 21 '17 14:03

yondu_udanta


2 Answers

If need replace non numeric values use to_numeric with parameter errors='coerce':

df['new'] = pd.to_numeric(df.Salary.astype(str).str.replace(',',''), errors='coerce')
              .fillna(0)
              .astype(int)
print (df)
   Age     Salary      new
0   21      25000    25000
1   22      30000    30000
2   22    Fresher        0
3   23   2,50,000   250000
4   24     25 LPA        0
5   35     400000   400000
6   45  10,00,000  1000000
like image 124
jezrael Avatar answered Oct 08 '22 20:10

jezrael


Use numpy where to find non digit value, replace with '0'.

df['New']=df.Salary.apply(lambda x: np.where(x.isdigit(),x,'0'))
like image 41
Shenglin Chen Avatar answered Oct 08 '22 20:10

Shenglin Chen