Assuming we have a vector of values with missing values like the following:
test <- c(3,6,NA,7,8,NA,NA,5,8,6,NA,4,3,NA,NA,NA)
The objective is to identify the series of NA that have a length of 2 or less in order to apply a linear interpolation for the series tha have non-NA values at their extremities. I was able to detect the index of the end of such series with this code:
which.na <- which(is.na(test))
diff.which.na <- diff(which.na)
which.diff.which.na <- which(diff.which.na>1)
end.index <- which.na[which.diff.which.na]
result:
> end.index
[1] 3 7 11
the last NA series could be treated with a conditional statement. However I'm not able to find the index of the beginning of a NA series because I can't do the following:
diff.which.na <- diff(which.na,lag=-1)
So the expected output is:
beg.index= c(3,6,11)
and
end.index=c(3,7,11)
Any ideas?
Thank you
You can try with rle
:
seq_na <- rle(is.na(test))
seq_na
#Run Length Encoding
# lengths: int [1:8] 2 1 2 2 3 1 2 3
# values : logi [1:8] FALSE TRUE FALSE TRUE FALSE TRUE ...
And look for a sequence of TRUE
with lengths at least 2:
seq_na$lengths[seq_na$values]
# [1] 1 2 1 3 # there are 2 of them
To find the indices, you can do with cumsum
(thanks to @Frank for the improvment!):
end.index <- with(seq_na, cumsum(lengths)[lengths <= 2 & values])
#[1] 3 7 11
beg.index <- end.index - with(seq_na, +(lengths==2 & values)[lengths <= 2 & values])
#[1] 3 6 11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With