I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN?
You can use pd.DataFrame.mask
:
df.mask((df >= -200) & (df <= -100), inplace=True)
This method replaces elements identified by True
values in a Boolean array with a specified value, defaulting to NaN
if a value is not specified.
Equivalently, use pd.DataFrame.where
with the reverse condition:
df.where((df < -200) | (df > -100), inplace=True)
As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A']
, you can use pd.Series.mask
with pd.Series.between
:
df['A'].mask(df['A'].between(-200, -100), inplace=True)
For chaining, note inplace=False
by default, so you can also use:
df['A'] = df['A'].mask(df['A'].between(-200, -100))
You can do it this way:
In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))
In [146]: df
Out[146]:
a b c
0 -188 -63 -228
1 -59 -70 -66
2 -110 39 -146
3 -67 -228 -232
4 -22 -180 -140
5 -191 -136 -188
6 -59 -30 -128
7 -201 -244 -195
8 -248 -30 -25
9 11 1 20
In [148]: df[(df>=-200) & (df<=-100)] = np.nan
In [149]: df
Out[149]:
a b c
0 NaN -63.0 -228.0
1 -59.0 -70.0 -66.0
2 NaN 39.0 NaN
3 -67.0 -228.0 -232.0
4 -22.0 NaN NaN
5 NaN NaN NaN
6 -59.0 -30.0 NaN
7 -201.0 -244.0 NaN
8 -248.0 -30.0 -25.0
9 11.0 1.0 20.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With