Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the indexes of a series of NA in a vector

Tags:

r

Assuming we have a vector of values with missing values like the following:

test <- c(3,6,NA,7,8,NA,NA,5,8,6,NA,4,3,NA,NA,NA)

The objective is to identify the series of NA that have a length of 2 or less in order to apply a linear interpolation for the series tha have non-NA values at their extremities. I was able to detect the index of the end of such series with this code:

which.na <- which(is.na(test))

diff.which.na <- diff(which.na)

which.diff.which.na <- which(diff.which.na>1)

end.index <- which.na[which.diff.which.na]

result:

> end.index
[1]  3  7 11

the last NA series could be treated with a conditional statement. However I'm not able to find the index of the beginning of a NA series because I can't do the following:

diff.which.na <- diff(which.na,lag=-1)

So the expected output is:

beg.index= c(3,6,11)

and

end.index=c(3,7,11)

Any ideas?

Thank you

like image 551
Samy Geronymos Avatar asked Jan 07 '23 00:01

Samy Geronymos


1 Answers

You can try with rle:

seq_na <- rle(is.na(test))
seq_na
#Run Length Encoding
#  lengths: int [1:8] 2 1 2 2 3 1 2 3
#  values : logi [1:8] FALSE TRUE FALSE TRUE FALSE TRUE ...

And look for a sequence of TRUE with lengths at least 2:

seq_na$lengths[seq_na$values]
# [1] 1 2 1 3 # there are 2 of them

To find the indices, you can do with cumsum (thanks to @Frank for the improvment!):

end.index <- with(seq_na, cumsum(lengths)[lengths <= 2 & values])
#[1]  3  7 11

beg.index <- end.index - with(seq_na, +(lengths==2 & values)[lengths <= 2 & values])
#[1]  3  6 11
like image 153
Cath Avatar answered Jan 30 '23 21:01

Cath