Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying (and removing) sequences from a vector in Matlab/Octave

Tags:

matlab

octave

I'm trying to prune any sequence of length 3 or more from a vector of numbers in Matlab (or Octave). For example, given the vector dataSet,

dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];

removing all sequences of length 3 or more would yield prunedDataSet:

prunedDataSet = [7 9 11 13 22 28 30 31 ];

I can brute force a solution, but I suspect there is a more succinct (and perhaps efficient) way to do it using vector/matrix operations, but I always get confused about whether something yields an index or the value at said index. Suggestions?

Here's the brute force method I came up with:

dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
benign = [];
for i = 1:size(dataSet,2)-2;
    if (dataSet(i) == (dataSet(i+1)-1) && dataSet(i) == dataSet(i+2)-2);
        benign = [benign i ] ;
    end;
end;

remove = [];
for i = 1:size(benign,2);
    remove = [remove benign(i) benign(i)+1 benign(i)+2 ];
end;

remove = unique(remove);

prunedDataSet = setdiff(dataSet, dataSet(remove));
like image 746
jhfrontz Avatar asked Dec 28 '22 19:12

jhfrontz


2 Answers

Here's a solution using DIFF and STRFIND

%# define dataset
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];

%# take the difference. Whatever is part of a sequence will have difference 1
dds = diff(dataSet);

%# sequences of 3 lead to two consecutive ones. Sequences of 4 are like two sequences of 3
seqIdx = findstr(dds,[1 1]);

%# remove start, start+1, start+2
dataSet(bsxfun(@plus,seqIdx,[0;1;2])) = []
dataSet =

     7     9    11    13    22    28    30    31
like image 130
Jonas Avatar answered Mar 02 '23 01:03

Jonas


Here's an attempt using vector-matrix notation:

s1 = [(dataSet(1:end-1) == dataSet(2:end)-1), false];
s2 = [(dataSet(1:end-2) == dataSet(3:end)-2), false, false];
s3 = s1 & s2;
s = s3 | [false, s3(1:end-1)] | [false, false, s3(1:end-2)];
dataSet(~s)

The idea is: s1 is true for all positions where a number a appears before a+1. s2 is true for all positions where a appears two positions before a+2. Then s becomes true where both the previous conditions are met. Then, we build s such that every true value is propagated to its two successors.

Finally, dataSet(~s) keeps all the values for which the above conditions are false, that is, it keeps numbers that are not part of a 3-sequence.

like image 26
Pablo Avatar answered Mar 02 '23 00:03

Pablo