Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?

Tags:

python

pandas

I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...

id    credit_score    column_b    column_c
0     750             ...         ...
1     653             ...         ...
2     741             ...         ...
3     65100           ...         ...
4     73500           ...         ...
5     565             ...         ...
6     480             ...         ...
7     78900           ...         ...
8     699             ...         ...
9     71500           ...         ...

So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.

like image 202
ScottP Avatar asked Jan 08 '17 15:01

ScottP


3 Answers

You can use mask:

df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)

Or numpy.where:

df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)

print (df)
   id  credit_score    col   col1
0   0           750  750.0  750.0
1   1           653  653.0  653.0
2   2           741  741.0  741.0
3   3         65100  651.0  651.0
4   4         73500  735.0  735.0
5   5           565  565.0  565.0
6   6           480  480.0  480.0
7   7         78900  789.0  789.0
8   8           699  699.0  699.0
9   9         71500  715.0  715.0
like image 98
jezrael Avatar answered Oct 08 '22 07:10

jezrael


You can use Series.apply. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.

def fix_scores(score):
    return score / 100 if score > 800 else score
    # same as
    # if score > 800:
    #      return score / 100
    # return score

df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)
like image 28
DeepSpace Avatar answered Oct 08 '22 09:10

DeepSpace


I'd use Pandas boolean indexing:

In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100

In [194]: df
Out[194]:
    credit_score
id
0          750.0
1          653.0
2          741.0
3          651.0
4          735.0
5          565.0
6          480.0
7          789.0
8          699.0
9          715.0
like image 20
MaxU - stop WAR against UA Avatar answered Oct 08 '22 09:10

MaxU - stop WAR against UA