I have a vector that looks like this:
out1[1:200]
[1] NA NA NA NA 0 1 2 NA NA NA 1 NA 0 NA 0 1 NA NA 0 NA 0 1 2 2 2 NA 0 1 2 3 4 4 5 6 7 8 9 9 9 9
[41] 10 11 NA 0 0 NA 1 NA 0 1 NA 0 NA 0 1 2 NA 1 NA 0 0 0 1 2 NA NA NA 0 0 NA 0 0 0 1 2 NA 1 2 NA 0
[81] 1 2 3 4 5 6 7 8 NA 0 1 2 3 4 NA 0 1 2 2 3 4 5 NA 0 1 2 3 3 4 5 5 6 7 NA 1 2 NA 1 2 NA
[121] 0 1 2 NA 1 2 3 3 3 3 4 NA 0 0 0 1 2 3 4 5 NA NA 0 1 NA NA NA 1 2 2 3 NA 1 2 2 2 NA NA 0 1
[161] NA 1 NA 1 2 NA 0 0 NA NA 0 1 NA NA NA NA 1 2 3 NA NA 1 2 3 4 5 6 NA 1 2 3 4 5 6 6 7 8 NA 0 1
I now want to subset a df
(with the same length) by this vector, but only sequences that have a range over minimum 5 consecutive numbers, e.g. 0:4, or 1:5 (and of course everything longer than this). Hence, NA
's should be FALSE
as well.
E.g.
out1: NA NA 0 1 2 2 NA 0 0 1 2 3 3 4 NA
Then the result should be
out2: F F F F F F F T T T T T T T F
The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.
The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .
Following gives the desired result
library(data.table) # v >= 1.9.5 (devel version - install from GitHub)
data.table(x)[,id:=rleid(!is.na(x)),
][ , aa:=(.N>5) , by = id
][ ,aaa:=4 %in% cumsum(diff(unique(sort(x)))), by = .(id, aa)
]$aaa
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
data
x <- c(NA, NA, NA, NA, NA, 0, 1, 2, NA, 0, 1, 2, 3, 4, 4, 5, NA, 1, 2, 3, 3, 3, 3, 4, NA)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With