Is it possible to find the non nan values of a vector but also allowing n number of nans? For example, if I have the following data:
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array
thres = 1; % this is the number of nans to allow
and I would like to only keep the longest sequence of values with non nans but allow 'n' number of nans to be kept in the data. So, say that I am willing to keep 1 nan I would have an output of
X_out = [8 10 11 nan 9 14 6 1 4 23 24]; %// output array
Thats is, the two nans at the beginning have been removed becuase they exceed the values in 'thres' above, but the third nan is on its own thus can be kept in the data. I would like to develop a method where thres can be defined as any value.
I can find the non nan values with
Y = ~isnan(X); %// convert to zeros and ones
Any ideas?
In order to find the longest sequence containing at most threshold
times NaN
we must find the start and the end of said sequence(s).
To generate all possible start points, we can use hankel
:
H = hankel(X)
H =
18 3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24
3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0
NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0
NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0 0
8 10 11 NaN 9 14 6 1 4 23 24 0 0 0 0
10 11 NaN 9 14 6 1 4 23 24 0 0 0 0 0
11 NaN 9 14 6 1 4 23 24 0 0 0 0 0 0
NaN 9 14 6 1 4 23 24 0 0 0 0 0 0 0
9 14 6 1 4 23 24 0 0 0 0 0 0 0 0
14 6 1 4 23 24 0 0 0 0 0 0 0 0 0
6 1 4 23 24 0 0 0 0 0 0 0 0 0 0
1 4 23 24 0 0 0 0 0 0 0 0 0 0 0
4 23 24 0 0 0 0 0 0 0 0 0 0 0 0
23 24 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now we need to find the last valid element in each row.
To do so, we can use cumsum
:
C = cumsum(isnan(H),2)
C =
0 0 1 2 2 2 2 3 3 3 3 3 3 3 3
0 1 2 2 2 2 3 3 3 3 3 3 3 3 3
1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The end point for each row is the one, where the corresponding element in C
is at most threshold
:
threshold = 1;
T = C<=threshold
T =
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The last valid element is found using:
[~,idx]=sort(T,2);
lastone=idx(:,end)
lastone =
3 2 1 4 15 15 15 15 15 15 15 15 15 15 15
We must make sure that the actual length of each row is respected:
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length)
max_length =
11
max_idx =
5
In case there are more sequences of equal maximum length, we just take the first and display it:
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)
ans =
8 10 11 NaN 9 14 6 1 4 23 24
full script
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24];
H = hankel(X);
C = cumsum(isnan(H),2);
threshold = 1;
T = C<=threshold;
[~,idx]=sort(T,2);
lastone=idx(:,end)';
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length);
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)
One possible approach is to convolve Y = double(~isnan(X));
with a window of n
ones, where n
is decreased by until an acceptable subsequence is found. "Acceptable" means that the subsequence contains at least n-thres
ones, that is, the convolution gives at least n-thres
.
Y = double(~isnan(X));
for n = numel(Y):-1:1 %// try all possible sequence lengths
w = find(conv(Y,ones(1,n),'valid')>=n-thres); %// is there any acceptable subsequence?
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
Convolving Y
with a window of n
ones (as in approach 1) is equivalent to computing a cumulative sum of Y
and then taking differences with n
spacing. This is more efficient in terms of number of operations.
Y = double(~isnan(X));
Z = cumsum(Y);
for n = numel(Y):-1:1
w = find([Z(n) Z(n+1:end)-Z(1:end-n)]>=n-thres);
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
This essentially computes all iterations of the loop in approach 1 at once.
Y = double(~isnan(X));
z = conv2(Y, tril(ones(numel(Y))));
[nn, ww] = find(bsxfun(@ge, z, (1:numel(Y)).'-thres)); %'
[n, ind] = max(nn);
w = ww(ind)-n+1;
result = X(w:w+n-1);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With