I know how to create a new column with apply
or np.where
based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix
is involved? Am I close?
For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':
>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df
name flag
0 Mick Purple
1 John Red
2 Christine NaN
3 Stevie NaN
4 Lindsey NaN
[5 rows x 2 columns]
I can make a boolean series from my criteria:
>boolean_result = df.name.str.contains('e$')
>print boolean_result
0 False
1 False
2 True
3 True
4 False
Name: name, dtype: bool
I just need the crucial step to get the following result:
>>> print result_wanted
name flag
0 Mick Purple
1 John Red
2 Christine Blue
3 Stevie Blue
4 Lindsey NaN
df['flag'][df.name.str.contains('e$')] = 'Blue'
pandas.DataFrame.mask(cond, other=nan) does exactly things you want.
It replaces values with the value of other
where the condition is True.
df['flag'].mask(boolean_result, other='blue', inplace=True)
inplace=True
means to perform the operation in place on the data.
If you want to replace value on condition false, you could consider using pandas.DataFrame.where()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With