This my vector:
myvector<-c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 10L, 12L, 142L, 143L, 149L, 150L)
As you can see there some sequences inside this vector:
Seq1: 1,2
Seq2: 4,5,6,7,8
Seq3: 10
Seq4: 12
Seq5: 142,143
Seq6: 149,150
Im trying to implement a code that identifies this sequences and extract the last onee. The result should be:
output<- c(2L, 8L,10L,12L, 143L, 150L)
I have other vectors bigger ones. But if I am able to do this with this vector I will be able to do with the others.
I tried to use diff
but the last element is deleted.
Any help guys?
We can create a grouping vector with diff
and cumsum
, use that in tapply
to extract the last element
unname(tapply(myvector, cumsum(c(TRUE, diff(myvector) != 1)),
FUN = tail, 1))
#[1] 2 8 10 12 143 150
Or another simple option is
by(myvector, cumsum(c(TRUE, diff(myvector) != 1)), FUN = tail, 1)
Or an option is split
into a list
, extract the last element by looping through the list
lst1 <- split(myvector, cumsum(c(TRUE, diff(myvector) != 1)),)
unname(sapply(lst1, tail, 1))
#[1] 2 8 10 12 143 150
Or create a grouping column in a data.frame/tibble and then do a regular slice/filter
library(tidyverse)
tibble(val = myvector, grp = cumsum(c(TRUE, diff(val) != 1))) %>%
group_by(grp) %>%
slice(n()) %>%
pull(val)
#[1] 2 8 10 12 143 150
Here is another solution just with subsetting
myvector<-c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 10L, 12L, 142L, 143L, 149L, 150L)
myvector[which(diff(myvector) == 1)[!diff(which(diff(myvector, lag=1) == 1) + 1) == 1] + 1]
which(diff(myvector) == 1)
[1] 1 3 4 5 6 10 12
!diff(which(diff(myvector, lag=1) == 1) + 1) == 1
notice that this is a subset of the sequence vector
[1] 1 6 10 12
+1
[1] 2 7 11 13
These are the indices for the last elements of sequences! :)
Save subsetting operation done twice
seqs <- which(diff(myvector) == 1)
myvector[seqs[!diff(seqs + 1) == 1] + 1]
microbenchmark::microbenchmark({seqs <- which(diff(myvector) == 1)
+ myvector[seqs[!diff(seqs + 1) == 1] + 1]})
Unit: microseconds
expr
{ seqs <- which(diff(myvector) == 1) myvector[seqs[!diff(seqs + 1) == 1] + 1] }
min lq mean median uq max neval
11.773 12.3345 13.2772 12.473 12.7435 68.969 100
microbenchmark::microbenchmark({myvector[which(diff(myvector) == 1)[!diff(which(diff(myvector, lag=1) == 1) + 1) == 1] + 1]})
Unit: microseconds
expr
{ myvector[which(diff(myvector) == 1)[!diff(which(diff(myvector, lag = 1) == 1) + 1) == 1] + 1] }
min lq mean median uq max neval
17.721 18.295 19.44263 18.5855 18.926 82.875 100
Even simpler since we do not have to take care of whether a value is part of a sequence or not. We subset by whether the next value breaks the "sequence". The final value is included in any case. Either it ends a sequence or it is a single value but we know there is not another incremental integer.
myvector<-c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 10L, 12L, 142L, 143L, 149L, 150L)
# Test with different vector
myvector2<-c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 10L, 12L, 142L, 143L, 148L, 150L)
lastSeq <- function(vector){
return(vector[c(which(diff(vector) != 1), length(vector))] )
}
lastSeq(myvector)
lastSeq(myvector2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With