Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting number of rows since last observation that fulfills condition

Tags:

r

dplyr

My Data Frame looks something like the first three columns of this example:

id    obs   value   newCol
a     1     uncool  NA
a     2     cool    1
a     3     uncool  NA
a     4     uncool  NA
a     5     cool    2
a     6     uncool  NA
a     7     cool    1
a     8     uncool  NA
b     1     cool    0

What I need is a column (newCol above) that counts the number of "uncool"s between the observations with value "cool" or the first row of the group (grouped by id).

How do I do that (by using dplyr ideally)?

like image 628
TIm Haus Avatar asked Feb 07 '19 09:02

TIm Haus


1 Answers

We can define groups by doing a cumsum starting from the bottom, then use ave to build a vector for each group :

transform(dat, newCol = ave(
  value, id, rev(cumsum(rev(value=="cool"))),
  FUN = function(x) ifelse(x=="cool", length(x)-1, NA)))
#   id obs  value newCol
# 1  a   1 uncool   <NA>
# 2  a   2   cool      1
# 3  a   3 uncool   <NA>
# 4  a   4 uncool   <NA>
# 5  a   5   cool      2
# 6  a   6 uncool   <NA>
# 7  a   7   cool      1
# 8  a   8 uncool   <NA>
# 9  b   1   cool      0

With dplyr :

dat %>%
  group_by(id,temp = rev(cumsum(rev(value=="cool")))) %>%
  mutate(newCol = ifelse(value=="cool", n()-1, NA)) %>%
  ungroup() %>%
  select(-temp)
# # A tibble: 9 x 4
# id   obs  value newCol
#   <chr> <int>  <chr>  <dbl>
# 1     a     1 uncool     NA
# 2     a     2   cool      1
# 3     a     3 uncool     NA
# 4     a     4 uncool     NA
# 5     a     5   cool      2
# 6     a     6 uncool     NA
# 7     a     7   cool      1
# 8     a     8 uncool     NA
# 9     b     1   cool      0
like image 125
Moody_Mudskipper Avatar answered Sep 29 '22 09:09

Moody_Mudskipper