I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?
For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':
>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
        'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df
        name    flag
0       Mick  Purple
1       John     Red
2  Christine     NaN
3     Stevie     NaN
4    Lindsey     NaN
[5 rows x 2 columns]
I can make a boolean series from my criteria:
>boolean_result = df.name.str.contains('e$')
>print boolean_result
0    False
1    False
2     True
3     True
4    False
Name: name, dtype: bool
I just need the crucial step to get the following result:
>>> print result_wanted
        name    flag
0       Mick  Purple
1       John     Red
2  Christine    Blue
3     Stevie    Blue
4    Lindsey     NaN
                df['flag'][df.name.str.contains('e$')] = 'Blue'
                        pandas.DataFrame.mask(cond, other=nan) does exactly things you want.
It replaces values with the value of other where the condition is True.
df['flag'].mask(boolean_result, other='blue', inplace=True)
inplace=True means to perform the operation in place on the data.
If you want to replace value on condition false, you could consider using pandas.DataFrame.where()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With