I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over a a period in time.
An extremely simple way to do this is something like this:
values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples)) pattern = re.compile('1{%d,}'%int(MIN_SILENCE)) for match in pattern.finditer(values): # code goes here
The code above finds parts where there are at least MIN_SILENCE consecutive elements smaller than SILENCE_THRESHOLD.
Now, obviously, the above code is horribly inefficient and a terrible abuse of regular expressions. Is there some other method that is more efficient, but still results in equally simple and short code?
Steps to find the most frequency value in a NumPy array: Create a NumPy array. Apply bincount() method of NumPy to get the count of occurrences of each element in the array. The n, apply argmax() method to get the value having a maximum number of occurrences(frequency).
maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
Here's a numpy-based solution.
I think (?) it should be faster than the other options. Hopefully it's fairly clear.
However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...
import numpy as np def main(): # Generate some random data x = np.cumsum(np.random.random(1000) - 0.5) condition = np.abs(x) < 1 # Print the start and stop indices of each region where the absolute # values of x are below 1, and the min and max of each of these regions for start, stop in contiguous_regions(condition): segment = x[start:stop] print start, stop print segment.min(), segment.max() def contiguous_regions(condition): """Finds contiguous True regions of the boolean array "condition". Returns a 2D array where the first column is the start index of the region and the second column is the end index.""" # Find the indicies of changes in "condition" d = np.diff(condition) idx, = d.nonzero() # We need to start things after the change in "condition". Therefore, # we'll shift the index by 1 to the right. idx += 1 if condition[0]: # If the start of condition is True prepend a 0 idx = np.r_[0, idx] if condition[-1]: # If the end of condition is True, append the length of the array idx = np.r_[idx, condition.size] # Edit # Reshape the result into two columns idx.shape = (-1,2) return idx main()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With