I have data like this:
1 0 1
1 1 1
0 1 1
1 1 1
1 1 1
1 1 1
1 1 0
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
0 0 1
1 1 1
1 1 1
1 1 1
Each column represents a device, and each row represents a time period. Each data point indicates whether or not the device was active in that time period. I'm trying to calculate the length of each uptime, or the "spell", that each device was active. In other words, the length of each spell of continuous ones in each column. In this case, it would be 2 11 3
for the first column, and so on.
This is easy to do with one device (a single column of data):
rng(1)
%% Parameters
lambda = 0.05; % Pr(failure)
N = 1; % number of devices
T = 18; % number of time periods in sample
%% Generate example data
device_status = [rand(T, N) >= lambda ; false(1, N)];
%% Calculate spell lengths, i.e. duration of uptime for each device
cumul_status = cumsum(device_status);
% The 'cumul_status > 0' condition excludes the case where the vector begins with one
% or more zeros
cumul_uptimes = cumul_status(device_status == 0 & cumul_status > 0);
uptimes = cumul_uptimes - [0 ; cumul_uptimes(1:end-1)];
so I could simply iterate over the columns and do this one column at a time and using parfor
(for example) to run this in parallel. Is there a way to do this across all columns simultaneously, using vectorized matrix operations?
EDIT: I should add that this is complicated by the fact that each device may have a different number of spells of uptimes.
Here's a way. Not sure it counts as vectorized, though.
Let your data matrix be denoted as x
. Then
[ii, jj] = find([true(1,size(x,2)); ~x; true(1,size(x,2))]);
result = accumarray(jj, ii, [], @(x){nonzeros(diff(x)-1)});
produces a cell array, where each cell corresponds to a column. In your example,
result{1} =
2
11
3
result{2} =
13
3
result{3} =
6
11
How this works
The idea is to find the row and column indices of zeros in x
(that is, true
values in ~x
), and then use the column indices as grouping variables (first argument to accumarray
).
Within each group we use the anonymous function @(x){nonzeros(diff(x)-1)}
to compute the differences in row positions of zeros. We can apply diff
directly because the column indices from find
are already sorted, thanks to Matlab's column major order. We subtract 1
because the zeros in x
don't count as part of the uptime; remove uptime lengths equal to 0
(with nonzeros
), and pack the resulting vector in a cell ({...}
).
A row of true
values is appended and prepended to ~x
to make sure we detect the initial and final uptime periods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With