Using Run-Length Encoding and Generating Sums

Question

I have the following run-length encoding data.

df1 <- structure(list(lengths = c(2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L), values = c(10, 9, NA, 5, 4, 3, NA, 2, NA, 1, 0, NA, 0)), row.names = c(NA, -13L), class = "data.frame")
df1
# > df1
#    lengths values
# 1        2     10
# 2        3      9
# 3        2     NA
# 4        1      5
# 5        1      4
# 6        1      3
# 7        1     NA
# 8        1      2
# 9        2     NA
# 10       1      1
# 11       1      0
# 12       3     NA
# 13       1      0

Using a particular threshold (0.01), I create a new variable in this data frame.

df1$Below_Threshold <- ifelse(df1$values <= 0.01, TRUE, FALSE)
df1
# > df1
#    lengths values Below_Threshold
# 1        2     10           FALSE
# 2        3      9           FALSE
# 3        2     NA              NA
# 4        1      5           FALSE
# 5        1      4           FALSE
# 6        1      3           FALSE
# 7        1     NA              NA
# 8        1      2           FALSE
# 9        2     NA              NA
# 10       1      1           FALSE
# 11       1      0            TRUE
# 12       3     NA              NA
# 13       1      0            TRUE

I now want to perform run-length encoding on this new variable, but instead of simply returning the number of occurrences, I want to return the sum of the lengths column from the first data frame. The result should look like the sum column in the df2 data frame in the following chunk of code.

df2 <- structure(list(values = c(FALSE, NA, FALSE, NA, FALSE, NA, FALSE, TRUE, NA, TRUE), sum = c(5, 2, 3, 1, 1, 2, 1, 1, 3, 1)), class = "data.frame", row.names = c(NA, -10L))
df2
# > df2
#    values sum
# 1   FALSE   5
# 2      NA   2
# 3   FALSE   3
# 4      NA   1
# 5   FALSE   1
# 6      NA   2
# 7   FALSE   1
# 8    TRUE   1
# 9      NA   3
# 10   TRUE   1

Is there a nice, efficient way of achieving this result? base R solutions are preferred but all are welcome.

KU99 · Accepted Answer

df1 %>%
   group_by(grp = consecutive_id(values <= 0.01))%>%
   summarise(values = first(values) <= 0.01, sum = sum(lengths))

# A tibble: 10 × 3
     grp values   sum
   <int> <lgl>  <int>
 1     1 FALSE      5
 2     2 NA         2
 3     3 FALSE      3
 4     4 NA         1
 5     5 FALSE      1
 6     6 NA         2
 7     7 FALSE      1
 8     8 TRUE       1
 9     9 NA         3
10    10 TRUE       1

If that feels repetative, use:

df1 %>%
  mutate(values = values <= 0.01) %>%
  group_by(grp = consecutive_id(values))%>%
  summarise(values = first(values), sum = sum(lengths))

# A tibble: 10 × 3
     grp values   sum
   <int> <lgl>  <int>
 1     1 FALSE      5
 2     2 NA         2
 3     3 FALSE      3
 4     4 NA         1
 5     5 FALSE      1
 6     6 NA         2
 7     7 FALSE      1
 8     8 TRUE       1
 9     9 NA         3
10    10 TRUE       1

Using Run-Length Encoding and Generating Sums

Tags:

r

cumulative-sum

run-length-encoding

David Moore

1 Answers

KU99

Recent Activity

Donate For Us

Using Run-Length Encoding and Generating Sums

Tags:

r

cumulative-sum

run-length-encoding

David Moore

1 Answers

KU99

Related questions

Recent Activity

Donate For Us