My Data Frame
looks something like the first three columns of this example:
id obs value newCol
a 1 uncool NA
a 2 cool 1
a 3 uncool NA
a 4 uncool NA
a 5 cool 2
a 6 uncool NA
a 7 cool 1
a 8 uncool NA
b 1 cool 0
What I need is a column (newCol above) that counts the number of "uncool"s between the observations with value "cool" or the first row of the group (grouped by id).
How do I do that (by using dplyr
ideally)?
We can define groups by doing a cumsum
starting from the bottom, then use ave
to build a vector for each group :
transform(dat, newCol = ave(
value, id, rev(cumsum(rev(value=="cool"))),
FUN = function(x) ifelse(x=="cool", length(x)-1, NA)))
# id obs value newCol
# 1 a 1 uncool <NA>
# 2 a 2 cool 1
# 3 a 3 uncool <NA>
# 4 a 4 uncool <NA>
# 5 a 5 cool 2
# 6 a 6 uncool <NA>
# 7 a 7 cool 1
# 8 a 8 uncool <NA>
# 9 b 1 cool 0
With dplyr :
dat %>%
group_by(id,temp = rev(cumsum(rev(value=="cool")))) %>%
mutate(newCol = ifelse(value=="cool", n()-1, NA)) %>%
ungroup() %>%
select(-temp)
# # A tibble: 9 x 4
# id obs value newCol
# <chr> <int> <chr> <dbl>
# 1 a 1 uncool NA
# 2 a 2 cool 1
# 3 a 3 uncool NA
# 4 a 4 uncool NA
# 5 a 5 cool 2
# 6 a 6 uncool NA
# 7 a 7 cool 1
# 8 a 8 uncool NA
# 9 b 1 cool 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With