I have an hourly value. I want to count how many consecutive hours the value has been zero since the last time it was not zero. This is an easy job for a spreadsheet or for loop, but I am hoping for a snappy vectorized one-liner to accomplish the task.
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
df <- data.frame(x, zcount = NA)
df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)
for(i in 2:nrow(df))
df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)
Desired output:
R> df
x zcount
1 1 0
2 0 1
3 1 0
4 0 1
5 0 2
6 0 3
7 1 0
8 1 0
9 0 1
10 0 2
A: Use the rle() function. For example, let's apply rle() to the following sequence of numbers. We see that rle() returns a list of two elements: lengths and values, where the latter gives the unique number of each run, and the former gives the run length, i.e. the number of consecutive repeats within each run.
William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is
f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}
and in the current situation f7(!x)
. In terms of performance there is
> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
user system elapsed
0.076 0.000 0.077
> system.time(res0 <- cumul_zeros(x))
user system elapsed
0.345 0.003 0.349
> identical(res7, res0)
[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With