Instead of finding all the samples / data points within a list or an array which are greater than a particular threshold
, I would like to find only the first samples where a signal
becomes greater than a threshold
. The signal might cross the threshold several times. For example if I have an example signal:
signal = [1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0]
and a threshold = 2
, then
signal = numpy.array(signal)
is_bigger_than_threshold = signal > threshold
would give me all values in signal
which are greater than threshold
.
However, I would like to get only the first samples whenever signal becomes greater than threshold. Therefore, I am going through the whole list and make boolean comparisons like
first_bigger_than_threshold = list()
first_bigger_than_threshold.append(False)
for i in xrange(1, len(is_bigger_than_threshold)):
if(is_bigger_than_threshold[i] == False):
val = False
elif(is_bigger_than_threshold[i]):
if(is_bigger_than_threshold[i - 1] == False):
val = True
elif(is_bigger_than_threshold[i - 1] == True):
val = False
first_bigger_than_threshold.append(val)
This gives me the result I was looking for, namely
[False, False, True, False, False, False, False, False, False, True, False, False, False,
False, False, False, True, False, False, False, False, False]
In MATLAB I would do similarily
for i = 2 : numel(is_bigger_than_threshold)
if(is_bigger_than_threshold(i) == 0)
val = 0;
elseif(is_bigger_than_threshold(i))
if(is_bigger_than_threshold(i - 1) == 0)
val = 1;
elseif(is_bigger_than_threshold(i - 1) == 1)
val = 0;
end
end
first_bigger_than_threshold(i) = val;
end % for
Is there a more efficient (faster) way to perform this calculation?
If I generate data in Python, e.g.
signal = [round(random.random() * 10) for i in xrange(0, 1000000)]
and time it, calculating these values took 4.45
seconds. If I generate data in MATLAB
signal = round(rand(1, 1000000) * 10);
and execute the program it takes only 0.92
seconds.
Why is MATLAB almost 5 times quicker than Python performing this task?
Thanks in advance for your comments!
The other answers give you positions of first Trues, if you want a bool array that marks the first True, you can do it faster:
import numpy as np
signal = np.random.rand(1000000)
th = signal > 0.5
th[1:][th[:-1] & th[1:]] = False
This post explains why your code is slower than Matlab.
Try this code
import numpy as np
threshold = 2
signal = np.array([1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0])
indices_bigger_than_threshold = np.where(signal > threshold)[0] # get item
print indices_bigger_than_threshold
# [ 2 3 4 5 9 16 17 18 19 20]
non_consecutive = np.where(np.diff(indices_bigger_than_threshold) != 1)[0]+1 # +1 for selecting the next
print non_consecutive
# [4 5]
first_bigger_than_threshold1 = np.zeros_like(signal, dtype=np.bool)
first_bigger_than_threshold1[indices_bigger_than_threshold[0]] = True # retain the first
first_bigger_than_threshold1[indices_bigger_than_threshold[non_consecutive]] = True
np.where
returns indices that match the condition.
The strategy is to get indices upper than threshold
and remove the consecutive.
BTW, welcome to Python/Numpy world.
Based on the notion that the best way to speed things up is to pick the best algorithm, you can do this neatly with a simple edge detector:
import numpy
signal = numpy.array([1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0])
thresholded_data = signal > threshold
threshold_edges = numpy.convolve([1, -1], thresholded_data, mode='same')
thresholded_edge_indices = numpy.where(threshold_edges==1)[0]
print(thresholded_edge_indices)
prints [2 9 16]
, the indices corresponding to first entry in a sequence greater than the threshold. This will make things faster in both Matlab and Python (with Numpy) - on my machine about 12ms to do what took you 4.5s.
Edit: As pointed out by @eickenberg, the convolution can be replaced with numpy.diff(thresholded_data)
, which is conceptually a bit simpler, though in that case the indices will be out by 1, so remember to add those back in, and also to convert thresholded_data
to be an array of ints with thresholded_data.astype(int)
. There is no appreciable speed difference between the two methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With