Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate the length of continuous occurrences of a value (uptimes) in a matrix?

I have data like this:

 1     0     1
 1     1     1
 0     1     1
 1     1     1
 1     1     1
 1     1     1
 1     1     0
 1     1     1
 1     1     1
 1     1     1
 1     1     1
 1     1     1
 1     1     1
 1     1     1
 0     0     1
 1     1     1
 1     1     1
 1     1     1

Each column represents a device, and each row represents a time period. Each data point indicates whether or not the device was active in that time period. I'm trying to calculate the length of each uptime, or the "spell", that each device was active. In other words, the length of each spell of continuous ones in each column. In this case, it would be 2 11 3 for the first column, and so on.

This is easy to do with one device (a single column of data):

rng(1)

%% Parameters
lambda = 0.05;      % Pr(failure)
N = 1;              % number of devices
T = 18;             % number of time periods in sample

%% Generate example data
device_status = [rand(T, N) >= lambda ; false(1, N)];

%% Calculate spell lengths, i.e. duration of uptime for each device
cumul_status = cumsum(device_status);

% The 'cumul_status > 0' condition excludes the case where the vector begins with one
% or more zeros
cumul_uptimes = cumul_status(device_status == 0 & cumul_status > 0);
uptimes = cumul_uptimes - [0 ; cumul_uptimes(1:end-1)];

so I could simply iterate over the columns and do this one column at a time and using parfor (for example) to run this in parallel. Is there a way to do this across all columns simultaneously, using vectorized matrix operations?

EDIT: I should add that this is complicated by the fact that each device may have a different number of spells of uptimes.

like image 582
Michael A Avatar asked Oct 18 '22 19:10

Michael A


1 Answers

Here's a way. Not sure it counts as vectorized, though.

Let your data matrix be denoted as x. Then

[ii, jj] = find([true(1,size(x,2)); ~x; true(1,size(x,2))]);
result = accumarray(jj, ii, [], @(x){nonzeros(diff(x)-1)});

produces a cell array, where each cell corresponds to a column. In your example,

result{1} =
     2
    11
     3
result{2} =
    13
     3
result{3} =
     6
    11

How this works

The idea is to find the row and column indices of zeros in x (that is, true values in ~x), and then use the column indices as grouping variables (first argument to accumarray).

Within each group we use the anonymous function @(x){nonzeros(diff(x)-1)} to compute the differences in row positions of zeros. We can apply diff directly because the column indices from find are already sorted, thanks to Matlab's column major order. We subtract 1 because the zeros in x don't count as part of the uptime; remove uptime lengths equal to 0 (with nonzeros), and pack the resulting vector in a cell ({...}).

A row of true values is appended and prepended to ~x to make sure we detect the initial and final uptime periods.

like image 76
Luis Mendo Avatar answered Oct 31 '22 17:10

Luis Mendo