Let's say I have an array of values, r
, which range anywhere from 0
to 1
. I want to remove all values that are some threshold value away from the median. Let's assume here that that threshold value is 0.5
, and len(r) = 3000
. Then to mask out all values outside of this range, I can do a simple list comprehension, which I like:
mask = np.array([ri < np.median(r)-0.5 or ri > np.median(r)+0.5 for ri in r])
And if I use a timer on it:
import time
import numpy as np
start = time.time()
r = np.random.random(3000)
m = np.median(r)
maxr,minr = m-0.5, m+0.5
mask = [ri<minr or ri>maxr for ri in r]
end = time.time()
print('Took %.4f seconds'%(end-start))
>>> Took 0.0010 seconds
Is there a faster way to do this list comprehension and make the mask using NumPy
?
Edit:
I've tried several suggestions below, including:
An element-wise or operator: (r<minv) | (r>maxv)
A Numpy logical or: r[np.logical_or(r<minr, r>maxr)]
A absolute difference boolean array: abs(m-r) > 0.5
And here is the average time each one took after 300 runs through:
Python list comprehension: 0.6511 ms
Elementwise or: 0.0138 ms
Numpy logical or: 0.0241 ms
Absolute difference: 0.0248 ms
As you can see, the elementwise Or was always the fastest, by nearly a factor of two (don't know how that would scale with array elements). Who knew.
One liner...
new_mask = abs(np.median(r) - r) > 0.5
You can use numpy conditional selections to create new array, without those values.
start = time.time()
m = np.median(r)
maxr,minr = m-0.5, m+0.5
filtered_array = r[ (r < minr) | (r > maxr) ]
end = time.time()
print('Took %.4f seconds'%(end-start))
filtered_array
is slice of r
without masked values (all values that will be later removed by mask already removed in filtered_array
).
Update: used shorter syntax suggested by @ayhan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With