Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find large number of consecutive values fulfilling condition in a numpy array

Tags:

I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over a a period in time.

An extremely simple way to do this is something like this:

values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples)) pattern = re.compile('1{%d,}'%int(MIN_SILENCE))                                                                            for match in pattern.finditer(values):    # code goes here 

The code above finds parts where there are at least MIN_SILENCE consecutive elements smaller than SILENCE_THRESHOLD.

Now, obviously, the above code is horribly inefficient and a terrible abuse of regular expressions. Is there some other method that is more efficient, but still results in equally simple and short code?

like image 565
pafcu Avatar asked Dec 20 '10 22:12

pafcu


People also ask

How do I find the most frequent element in an array in NumPy?

Steps to find the most frequency value in a NumPy array: Create a NumPy array. Apply bincount() method of NumPy to get the count of occurrences of each element in the array. The n, apply argmax() method to get the value having a maximum number of occurrences(frequency).

Which function helps find the maximum value number in NumPy?

maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.

Does NumPy array have fixed size?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.


1 Answers

Here's a numpy-based solution.

I think (?) it should be faster than the other options. Hopefully it's fairly clear.

However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...

import numpy as np  def main():     # Generate some random data     x = np.cumsum(np.random.random(1000) - 0.5)     condition = np.abs(x) < 1          # Print the start and stop indices of each region where the absolute      # values of x are below 1, and the min and max of each of these regions     for start, stop in contiguous_regions(condition):         segment = x[start:stop]         print start, stop         print segment.min(), segment.max()  def contiguous_regions(condition):     """Finds contiguous True regions of the boolean array "condition". Returns     a 2D array where the first column is the start index of the region and the     second column is the end index."""      # Find the indicies of changes in "condition"     d = np.diff(condition)     idx, = d.nonzero()       # We need to start things after the change in "condition". Therefore,      # we'll shift the index by 1 to the right.     idx += 1      if condition[0]:         # If the start of condition is True prepend a 0         idx = np.r_[0, idx]      if condition[-1]:         # If the end of condition is True, append the length of the array         idx = np.r_[idx, condition.size] # Edit      # Reshape the result into two columns     idx.shape = (-1,2)     return idx  main() 
like image 168
Joe Kington Avatar answered Dec 18 '22 19:12

Joe Kington