Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas modify column values in place based on boolean array

Tags:

python

pandas

I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?

For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':

>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
        'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df

        name    flag
0       Mick  Purple
1       John     Red
2  Christine     NaN
3     Stevie     NaN
4    Lindsey     NaN
[5 rows x 2 columns]

I can make a boolean series from my criteria:

>boolean_result = df.name.str.contains('e$')
>print boolean_result
0    False
1    False
2     True
3     True
4    False
Name: name, dtype: bool

I just need the crucial step to get the following result:

>>> print result_wanted
        name    flag
0       Mick  Purple
1       John     Red
2  Christine    Blue
3     Stevie    Blue
4    Lindsey     NaN
like image 477
prooffreader Avatar asked May 01 '14 01:05

prooffreader


2 Answers

df['flag'][df.name.str.contains('e$')] = 'Blue'
like image 145
U2EF1 Avatar answered Oct 08 '22 18:10

U2EF1


pandas.DataFrame.mask(cond, other=nan) does exactly things you want.

It replaces values with the value of other where the condition is True.

df['flag'].mask(boolean_result, other='blue', inplace=True)

inplace=True means to perform the operation in place on the data.

If you want to replace value on condition false, you could consider using pandas.DataFrame.where()

like image 38
Ynjxsjmh Avatar answered Oct 08 '22 18:10

Ynjxsjmh