Here's a sample of booleans I have as part of a data.frame:
atest <- c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE)
I want to return a sequence of numbers starting at 1 from each FALSE and increasing by 1 until the next FALSE.
The resulting desired vector is:
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1
Here's the code that accomplishes this, but I'm sure there's a simpler or more elegant way to do this in R. I'm always trying to learn how to code things more efficiently in R rather than simply getting the job done.
result <- c()
x <- 1
for(i in 1:length(atest)){
if(atest[i] == FALSE){
result[i] <- 1
x <- 1
}
if(atest[i] != FALSE){
x <- x+1
result[i] <- x
}
}
Here's one way to do it, using handy (but not widely-known/used) base functions:
> sequence(tabulate(cumsum(!atest)))
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1
To break it down:
> # return/repeat integer for each FALSE
> cumsum(!atest)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
> # count the number of occurrences of each integer
> tabulate(cumsum(!atest))
[1] 10 10 1
> # create concatenated seq_len for each integer
> sequence(tabulate(cumsum(!atest)))
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1
Here is another approach using other familiar functions:
seq_along(atest) - cummax(seq_along(atest) * !atest) + 1L
Because it is all vectorized, it is noticeably faster than @Joshua's solution (if speed is of any concern):
f0 <- function(x) sequence(tabulate(cumsum(!x)))
f1 <- function(x) {i <- seq_along(x); i - cummax(i * !x) + 1L}
x <- rep(atest, 10000)
library(microbenchmark)
microbenchmark(f0(x), f1(x))
# Unit: milliseconds
# expr min lq median uq max neval
# f0(x) 19.386581 21.853194 24.511783 26.703705 57.20482 100
# f1(x) 3.518581 3.976605 5.962534 7.763618 35.95388 100
identical(f0(x), f1(x))
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With