Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding islands of zeros in a sequence

Imagine you have a very long sequence. What is the most efficient way of finding the intervals where the sequence is all zeros (or more precisely the sequence drops to near-zero values abs(X)<eps):

For simplicity, lets assume the following sequence:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0]; 

I'm trying to get the following information:

startIndex   EndIndex    Duration 3            6           4 12           12          1 14           16          3 25           26          2 30           30          1 

then using this information, we find the intervals with duration >= to some specified value (say 3), and returning the indices of the values in all these intervals combined:

indices = [3 4 5 6 14 15 16]; 

That last part is related to a previous question:

MATLAB: vectorized array creation from a list of start/end indices

This is what I have so far:

sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0]; len = length(sig); thresh = 3;  %# align the signal with itself successively shifted by one %# v will thus contain 1 in the starting locations of the zero interval v = true(1,len-thresh+1); for i=1:thresh     v = v & ( sig(i:len-thresh+i) == 0 ); end  %# extend the 1's till the end of the intervals for i=1:thresh-1     v(find(v)+1) = true; end  %# get the final indices v = find(v); 

I'm looking to vectorize/optimize the code, but I'm open to other solutions. I have to stress that space and time efficiencies are very important, since I'm processing a large number of long bio-signals.

like image 279
merv Avatar asked Jul 18 '10 02:07

merv


1 Answers

These are the steps I would take to solve your problem in a vectorized way, starting with a given vector sig:

  • First, threshold the vector to get a vector tsig of zeros and ones (zeroes where the absolute value of the signal drops close enough to zero, ones elsewhere):

    tsig = (abs(sig) >= eps);  %# Using eps as the threshold 
  • Next, find the starting indices, ending indices, and duration of each string of zeroes using the functions DIFF and FIND:

    dsig = diff([1 tsig 1]); startIndex = find(dsig < 0); endIndex = find(dsig > 0)-1; duration = endIndex-startIndex+1; 
  • Then, find the strings of zeroes with a duration greater than or equal to some value (such as 3, from your example):

    stringIndex = (duration >= 3); startIndex = startIndex(stringIndex); endIndex = endIndex(stringIndex); 
  • Finally, use the method from my answer to the linked question to generate your final set of indices:

    indices = zeros(1,max(endIndex)+1); indices(startIndex) = 1; indices(endIndex+1) = indices(endIndex+1)-1; indices = find(cumsum(indices)); 
like image 189
gnovice Avatar answered Oct 05 '22 11:10

gnovice