I am trying to replace "bad values" below and above of thresholds with a default value (e.g. setting them to NaN). I am unsing a numpy array with 1000k values and more - so performance is an issue.
My prototype does the operation in two steps, is there a pssoibility to do this in one step?
import numpy as np
data = np.array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
upper_threshold = 7
lower_threshold = 1
default_value = np.NaN
# is it possible to do this in one expression?
data[data > upper_threshold] = default_value
data[data < lower_threshold] = default_value
print data # [ nan 1. 2. 3. 4. 5. 6. 7. nan nan]
As commented in this related question (Pythonic way to replace list values with upper and lower bound (clamping, clipping, thresholding)?)
Like many other functions, np.clip is python, but it defers to arr.clip, the method. For regular arrays that method is compiled, so will be faster (about 2x). – hpaulj
I hope to find a faster way too, thanks in advance!
Use boolean-indexing
in one go with a combined mask -
data[(data > upper_threshold) | (data < lower_threshold)] = default_value
Runtime test -
In [109]: def onepass(data, upper_threshold, lower_threshold, default_value):
...: mask = (data > upper_threshold) | (data < lower_threshold)
...: data[mask] = default_value
...:
...: def twopass(data, upper_threshold, lower_threshold, default_value):
...: data[data > upper_threshold] = default_value
...: data[data < lower_threshold] = default_value
...:
In [110]: upper_threshold = 7
...: lower_threshold = 1
...: default_value = np.NaN
...:
In [111]: data = np.random.randint(-4,11,(1000000)).astype(float)
In [112]: %timeit twopass(data, upper_threshold, lower_threshold, default_value)
100 loops, best of 3: 2.41 ms per loop
In [113]: data = np.random.randint(-4,11,(1000000)).astype(float)
In [114]: %timeit onepass(data, upper_threshold, lower_threshold, default_value)
100 loops, best of 3: 2.74 ms per loop
Doesn't look like we are performing any better with the proposed one-pass-indexing
method. The reason could be that the computation of OR-ing
of masks is a bit more expensive than directly assigning values with the boolean-indexing itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With