You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.
To replace multiple values in a DataFrame we can apply the method DataFrame. replace(). In Pandas DataFrame replace method is used to replace values within a dataframe object.
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
Or, in one line,
df.loc[df.my_channel > 20000, 'my_channel'] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
np.where
function works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexingloc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexingYou can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexingYou can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With