Identify and count spells (Distinctive events within each group)

Question

I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

Data is sorted by group and then by time
There are no gaps in time within each group

Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

the rle approach by @markus took 0.53 seconds
the cumsum approach by @M-M took 2.85 seconds
the function approach by @MrFlick took 0.66 seconds
the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

markus · Accepted Answer

One option using rle

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = {
      r <- rle(is.5)
      r$values <- cumsum(r$values) * r$values
      inverse.rle(r) 
      }
  )
# A tibble: 14 x 4
# Groups:   group [2]
#   time                group  is.5 spell
#   <dttm>              <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A         0     0
# 2 2018-10-07 01:40:00 A         1     1
# 3 2018-10-07 01:41:00 A         1     1
# 4 2018-10-07 01:42:00 A         0     0
# 5 2018-10-07 01:43:00 A         1     2
# 6 2018-10-07 01:44:00 A         0     0
# 7 2018-10-07 01:45:00 A         0     0
# 8 2018-10-07 01:46:00 A         1     3
# 9 2018-05-20 14:00:00 B         0     0
#10 2018-05-20 14:01:00 B         0     0
#11 2018-05-20 14:02:00 B         1     1
#12 2018-05-20 14:03:00 B         1     1
#13 2018-05-20 14:04:00 B         0     0
#14 2018-05-20 14:05:00 B         1     2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := {
  r <- rle(is.5)
  r$values <- cumsum(r$values) * r$values
  inverse.rle(r) 
  }, by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
#  lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
#  values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

MrFlick · Answer

Here's a helper function that can return what you are after

spell_index <- function(time, flag) {
  change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
  cumsum(change) * (flag==1)+0
}

And you can use it with your data like

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = spell_index(time, is.5)
  )

Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.

Identify and count spells (Distinctive events within each group)

Tags:

dataframe

r

dplyr

time-series

grouping

Update

Thomas Speidel

2 Answers

markus

MrFlick

Recent Activity

Donate For Us

Identify and count spells (Distinctive events within each group)

Tags:

dataframe

r

dplyr

time-series

grouping

Update

Thomas Speidel

2 Answers

markus

MrFlick

Related questions

Recent Activity

Donate For Us