Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace a range of values with NaN in Pandas data-frame?

I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN?

like image 532
Mat_python Avatar asked Oct 20 '16 16:10

Mat_python


2 Answers

dataframe

You can use pd.DataFrame.mask:

df.mask((df >= -200) & (df <= -100), inplace=True)

This method replaces elements identified by True values in a Boolean array with a specified value, defaulting to NaN if a value is not specified.

Equivalently, use pd.DataFrame.where with the reverse condition:

df.where((df < -200) | (df > -100), inplace=True)

series

As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A'], you can use pd.Series.mask with pd.Series.between:

df['A'].mask(df['A'].between(-200, -100), inplace=True)

For chaining, note inplace=False by default, so you can also use:

df['A'] = df['A'].mask(df['A'].between(-200, -100))
like image 145
jpp Avatar answered Oct 21 '22 09:10

jpp


You can do it this way:

In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))

In [146]: df
Out[146]:
     a    b    c
0 -188  -63 -228
1  -59  -70  -66
2 -110   39 -146
3  -67 -228 -232
4  -22 -180 -140
5 -191 -136 -188
6  -59  -30 -128
7 -201 -244 -195
8 -248  -30  -25
9   11    1   20

In [148]: df[(df>=-200) & (df<=-100)] = np.nan

In [149]: df
Out[149]:
       a      b      c
0    NaN  -63.0 -228.0
1  -59.0  -70.0  -66.0
2    NaN   39.0    NaN
3  -67.0 -228.0 -232.0
4  -22.0    NaN    NaN
5    NaN    NaN    NaN
6  -59.0  -30.0    NaN
7 -201.0 -244.0    NaN
8 -248.0  -30.0  -25.0
9   11.0    1.0   20.0
like image 45
MaxU - stop WAR against UA Avatar answered Oct 21 '22 08:10

MaxU - stop WAR against UA