Labeling conditional events in dplyr with sequential data

Question

In the example below, the event start is defined as when the prior value of "values" is 90 or more and the current value is below 90. The event end is when the current value is below 90 and the next value is 90 or above.

sequential_index <- seq(1,10)
values <- c(91,90,89,89,90,90,89,88,90,91)
df <- data.frame(sequential_index, values)

Looking at df in the example above, the first event occurs for observations 3-4 and the second event occurs for observations 7-8. I am trying, to no avail, to add an "events" column to the above data frame that looks something like this:

       sequential_index values events
1                 1     91     NA
2                 2     90     NA
3                 3     89      1
4                 4     89      1
5                 5     90     NA
6                 6     90     NA
7                 7     89      2
8                 8     88      2
9                 9     90     NA
10               10     91     NA

My dataset is rather large and I'm trying to avoid for loops.
Thanks in advance, -jt

Jet · Accepted Answer

I have this solution using dplyr.

library(dplyr)

df %>%
# Define the start of events (putting 1 at the start of events)
mutate(events = case_when(lag(values)>=90 & values<90 ~ 1, TRUE ~ 0)) %>%
# Extend the events using cumsum()
mutate(events = case_when(values<90 ~ cumsum(events)))

Output :

   sequential_index values events
1                 1     91     NA
2                 2     90     NA
3                 3     89      1
4                 4     89      1
5                 5     90     NA
6                 6     90     NA
7                 7     89      2
8                 8     88      2
9                 9     90     NA
10               10     91     NA

akrun · Answer

One option with base R would be rle

df$events <- inverse.rle(within.list(rle(df$values < 90), 
        values[values] <- seq_along(values[values])
         ))
df$events[df$events == 0] <- NA
df$events
#[1] NA NA  1  1 NA NA  2  2 NA NA

Or in a compact way with data.table

library(data.table)
setDT(df)[, events := as.integer(factor(rleid(events < 90)[events < 90]))]

Labeling conditional events in dplyr with sequential data

Tags:

r

dplyr

JimmyT

2 Answers

Jet

akrun

Recent Activity

Donate For Us

Labeling conditional events in dplyr with sequential data

Tags:

r

dplyr

JimmyT

2 Answers

Jet

akrun

Related questions

Recent Activity

Donate For Us