Count length of sequential consequtive values per group in R

Question

I have a dataset with consequtive values and I would like to know the count of how many times each length occurs. More specifically, I want to find out how many id's have a sequence running from 1:2, from 1:3, from 1:4 etc. Only sequences starting from 1 are of interest.

In this example, id1 would have a "full" sequence running from 1:3 (as the number 4 is missing), id2 has a sequence running from 1:5, id3 has a sequence running from 1:6, id4 is not counted since it does not start with a value of 1 and id 5 has a sequence running from 1:3.

So we end up with two sequences until 3, one until 5 and one until 6.

Is there a clever way to calculate this, without resorting to inefficient loops?

Example data:

data <- data.table( id    = c(1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5),
                    value = c(1,2,3,5,1,2,3,4,5,10,11,1,2,3,4,5,6,2,3,4,5,6,7,8,1,2,3,7))

 > data
    id value
 1:  1     1
 2:  1     2
 3:  1     3
 4:  1     5
 5:  2     1
 6:  2     2
 7:  2     3
 8:  2     4
 9:  2     5
10:  2    10
11:  2    11
12:  3     1
13:  3     2
14:  3     3
15:  3     4
16:  3     5
17:  3     6
18:  4     2
19:  4     3
20:  4     4
21:  4     5
22:  4     6
23:  4     7
24:  4     8
25:  5     1
26:  5     2
27:  5     3
28:  5     7
    id value

r2evans · Accepted Answer

out <- data[, len0 := rleid(c(TRUE, diff(value) == 1L)), by = .(id) ][
  , .(value1 = first(value), len = .N), by = .(id, len0) ]
out
#       id  len0 value1   len
#    <num> <int>  <num> <int>
# 1:     1     1      1     3
# 2:     1     2      5     1
# 3:     2     1      1     5
# 4:     2     2     10     1
# 5:     2     3     11     1
# 6:     3     1      1     6
# 7:     4     1      2     7
# 8:     5     1      1     3
# 9:     5     2      7     1

Walk-through:

within each id, the len0 is created to identify the increase-by-1 steps
within id,len0, summarize with the first value (in case you only want those starting at 1, see below) and the length of the run

If you just want to know those whose sequences begin at one, filter on value1:

out[ value1 == 1L, ]
#       id  len0 value1   len
#    <num> <int>  <num> <int>
# 1:     1     1      1     3
# 2:     2     1      1     5
# 3:     3     1      1     6
# 4:     5     1      1     3

(I think you only need id and len at this point.)

chinsoon12 · Answer

Here is another option:

data[rowid(id)==value, max(value), id]

output:

Count length of sequential consequtive values per group in R

Tags:

r

count

data.table

grouping

Inkling

2 Answers

r2evans

chinsoon12

Recent Activity

Donate For Us

Count length of sequential consequtive values per group in R

Tags:

r

count

data.table

grouping

Inkling

2 Answers

r2evans

chinsoon12

Related questions

Recent Activity

Donate For Us