Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Labeling conditional events in dplyr with sequential data

Tags:

r

dplyr

In the example below, the event start is defined as when the prior value of "values" is 90 or more and the current value is below 90. The event end is when the current value is below 90 and the next value is 90 or above.

sequential_index <- seq(1,10)
values <- c(91,90,89,89,90,90,89,88,90,91)
df <- data.frame(sequential_index, values)

Looking at df in the example above, the first event occurs for observations 3-4 and the second event occurs for observations 7-8. I am trying, to no avail, to add an "events" column to the above data frame that looks something like this:

       sequential_index values events
1                 1     91     NA
2                 2     90     NA
3                 3     89      1
4                 4     89      1
5                 5     90     NA
6                 6     90     NA
7                 7     89      2
8                 8     88      2
9                 9     90     NA
10               10     91     NA

My dataset is rather large and I'm trying to avoid for loops.
Thanks in advance, -jt

like image 473
JimmyT Avatar asked Apr 26 '19 08:04

JimmyT


2 Answers

I have this solution using dplyr.

library(dplyr)

df %>%
# Define the start of events (putting 1 at the start of events)
mutate(events = case_when(lag(values)>=90 & values<90 ~ 1, TRUE ~ 0)) %>%
# Extend the events using cumsum()
mutate(events = case_when(values<90 ~ cumsum(events)))

Output :

   sequential_index values events
1                 1     91     NA
2                 2     90     NA
3                 3     89      1
4                 4     89      1
5                 5     90     NA
6                 6     90     NA
7                 7     89      2
8                 8     88      2
9                 9     90     NA
10               10     91     NA
like image 133
Jet Avatar answered Oct 20 '22 12:10

Jet


One option with base R would be rle

df$events <- inverse.rle(within.list(rle(df$values < 90), 
        values[values] <- seq_along(values[values])
         ))
df$events[df$events == 0] <- NA
df$events
#[1] NA NA  1  1 NA NA  2  2 NA NA

Or in a compact way with data.table

library(data.table)
setDT(df)[, events := as.integer(factor(rleid(events < 90)[events < 90]))]
like image 20
akrun Avatar answered Oct 20 '22 12:10

akrun