Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify and count spells (Distinctive events within each group)

I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

  1. Data is sorted by group and then by time

  2. There are no gaps in time within each group


Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

  1. the rle approach by @markus took 0.53 seconds
  2. the cumsum approach by @M-M took 2.85 seconds
  3. the function approach by @MrFlick took 0.66 seconds
  4. the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

like image 711
Thomas Speidel Avatar asked Apr 01 '19 20:04

Thomas Speidel


2 Answers

One option using rle

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = {
      r <- rle(is.5)
      r$values <- cumsum(r$values) * r$values
      inverse.rle(r) 
      }
  )
# A tibble: 14 x 4
# Groups:   group [2]
#   time                group  is.5 spell
#   <dttm>              <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A         0     0
# 2 2018-10-07 01:40:00 A         1     1
# 3 2018-10-07 01:41:00 A         1     1
# 4 2018-10-07 01:42:00 A         0     0
# 5 2018-10-07 01:43:00 A         1     2
# 6 2018-10-07 01:44:00 A         0     0
# 7 2018-10-07 01:45:00 A         0     0
# 8 2018-10-07 01:46:00 A         1     3
# 9 2018-05-20 14:00:00 B         0     0
#10 2018-05-20 14:01:00 B         0     0
#11 2018-05-20 14:02:00 B         1     1
#12 2018-05-20 14:03:00 B         1     1
#13 2018-05-20 14:04:00 B         0     0
#14 2018-05-20 14:05:00 B         1     2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := {
  r <- rle(is.5)
  r$values <- cumsum(r$values) * r$values
  inverse.rle(r) 
  }, by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
#  lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
#  values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

like image 160
markus Avatar answered Nov 09 '22 15:11

markus


Here's a helper function that can return what you are after

spell_index <- function(time, flag) {
  change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
  cumsum(change) * (flag==1)+0
}

And you can use it with your data like

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = spell_index(time, is.5)
  )

Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.

like image 6
MrFlick Avatar answered Nov 09 '22 15:11

MrFlick