Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find first sequence episode

Tags:

r

sequence

I am trying to create a vector indicating the end of a sequence.

My data looks this :

   id time   var wake
1   1    1 sleep    0
2   1    2 sleep    0
3   1    3 sleep    0
4   1    4     0    0
5   1    5     0    0

What I want is this (output wanted)

   id time   var wake
1   1    1 sleep    0
2   1    2 sleep    0
3   1    3 sleep    0
4   1    4     0    1
5   1    5     0    0
6   1    6     0    0
7   1    7     0    0
8   1    8 sleep    0
9   1    9 sleep    0
10  1   10 sleep    0
11  2    1 sleep    0
12  2    2 sleep    0
13  2    3 sleep    0
14  2    4 sleep    0
15  2    5 sleep    0
16  2    6     0    1
17  2    7     0    0
18  2    8     0    0
19  2    9 sleep    0
20  2   10 sleep    0

I was thinking of something like

library(dplyr) 

dt$time = as.numeric(as.character(dt$time))
dt$var = ifelse(dt$var == 'sleep', 1, 0)

dt = dt %>% group_by(id) %>% 
mutate(grp = cumsum(var != lag(var, default = var[1])))

dt$wake = 0
dt$wake [dt$grp == 1] <- 1

However, it doesn't spot the first episode only

data

dt = structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"2"), class = "factor"), time = structure(c(1L, 3L, 4L, 5L, 6L, 
 7L, 8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label =     c("1", 
"10", "2", "3", "4", "5", "6", "7", "8", "9"), class = "factor"), 
var = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", 
"sleep"), class = "factor")), .Names = c("id", "time", "var"
), row.names = c(NA, -20L), class = "data.frame")
like image 893
giac Avatar asked Feb 04 '26 07:02

giac


1 Answers

In one pass with library data.table:

setDT(dt)
dt[,wake:=( c(0,diff( rleid(var) ) == 1) & var != "sleep"),by=id]

The idea is to get the run length encoding of var (rleid):

> dt[,rleid(var),by=id][,V1]
[1] 1 1 1 2 2 2 2 3 3 3 1 1 1 1 1 2 2 2 3 3

And it's diff +1 when going from sleep to 0, or 0 to sleep, negative when changing group (start again at 1):

> diff(dt[,rleid(var),by=id][,V1])
[1]  0  0  1  0  0  0  1  0  0 -2  0  0  0  0  1  0  0  1  0

And where it is 1 and var is not sleep get a TRUE value (could be 1 if you wrap the whole thing into as.numeric).

Output:

    nrow id time   var  wake
 1:    1  1    1 sleep FALSE
 2:    2  1    2 sleep FALSE
 3:    3  1    3 sleep FALSE
 4:    4  1    4     0  TRUE
 5:    5  1    5     0 FALSE
 6:    6  1    6     0 FALSE
 7:    7  1    7     0 FALSE
 8:    8  1    8 sleep FALSE
 9:    9  1    9 sleep FALSE
10:   10  1   10 sleep FALSE
11:   11  2    1 sleep FALSE
12:   12  2    2 sleep FALSE
13:   13  2    3 sleep FALSE
14:   14  2    4 sleep FALSE
15:   15  2    5 sleep FALSE
16:   16  2    6     0  TRUE
17:   17  2    7     0 FALSE
18:   18  2    8     0 FALSE
19:   19  2    9 sleep FALSE
20:   20  2   10 sleep FALSE
like image 105
Tensibai Avatar answered Feb 05 '26 23:02

Tensibai