In the example below, the event start is defined as when the prior value of "values" is 90 or more and the current value is below 90. The event end is when the current value is below 90 and the next value is 90 or above.
sequential_index <- seq(1,10)
values <- c(91,90,89,89,90,90,89,88,90,91)
df <- data.frame(sequential_index, values)
Looking at df in the example above, the first event occurs for observations 3-4 and the second event occurs for observations 7-8. I am trying, to no avail, to add an "events" column to the above data frame that looks something like this:
sequential_index values events
1 1 91 NA
2 2 90 NA
3 3 89 1
4 4 89 1
5 5 90 NA
6 6 90 NA
7 7 89 2
8 8 88 2
9 9 90 NA
10 10 91 NA
My dataset is rather large and I'm trying to avoid for loops.
Thanks in advance,
-jt
I have this solution using dplyr
.
library(dplyr)
df %>%
# Define the start of events (putting 1 at the start of events)
mutate(events = case_when(lag(values)>=90 & values<90 ~ 1, TRUE ~ 0)) %>%
# Extend the events using cumsum()
mutate(events = case_when(values<90 ~ cumsum(events)))
Output :
sequential_index values events
1 1 91 NA
2 2 90 NA
3 3 89 1
4 4 89 1
5 5 90 NA
6 6 90 NA
7 7 89 2
8 8 88 2
9 9 90 NA
10 10 91 NA
One option with base R
would be rle
df$events <- inverse.rle(within.list(rle(df$values < 90),
values[values] <- seq_along(values[values])
))
df$events[df$events == 0] <- NA
df$events
#[1] NA NA 1 1 NA NA 2 2 NA NA
Or in a compact way with data.table
library(data.table)
setDT(df)[, events := as.integer(factor(rleid(events < 90)[events < 90]))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With