I have the following data:
data <- structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L,
1234L, 1234L, 1234L, 1234L, 1234L, 4758L, 4758L, 9584L, 9584L,
9584L, 9584L, 9584L, 9584L), time = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), fruit = structure(c(1L,
6L, 1L, 1L, 6L, 5L, 5L, 3L, 4L, 1L, 2L, 4L, 2L, 1L, 6L, 5L, 5L,
3L, 2L), .Label = c("apple", "banana", "lemon", "lime", "orange",
"pear"), class = "factor"), count = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), cum_sum = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 2L, 1L, 2L, 3L,
4L, 5L, 6L)), .Names = c("user", "time", "fruit", "count", "cum_sum"
), row.names = c(NA, -19L), class = "data.frame")
For every user in this set, I want to look at the sequence of fruits over time. But, some fruits are listed "back to back" in time.
user time fruit count cum_sum
1 1234 1 apple 1 1
2 1234 2 pear 1 2
3 1234 3 apple 1 3
4 1234 4 apple 1 4
5 1234 5 pear 1 5
6 1234 6 orange 1 6
7 1234 7 orange 1 7
What I'm looking for is more of a time-series by user by unique fruit.
Problem is, if I group by user and fruit then summarise, dplyr automatically sorts fruit alphabetically:
data %>%
group_by(user, fruit) %>%
summarise(temp_var=1) %>%
mutate(cum_sum = cumsum(temp_var))
What I really want is, for user 1234 above (for example) for the fruits to be listed in order of time series, but removing any duplicates. So where we see apple > pear > apple > apple > pear > orange > orange, we'd instead only see apple > pear > apple > pear > orange
So using rleid
function from the latest data.table
version on CRAN we can simply do (though not sure regarding your exact desired output)
library(data.table) ## v >= 1.9.6
res <- setDT(data)[, .(fruit = fruit[1L]), by = .(user, indx = rleid(fruit))
][, cum_sum := seq_len(.N), by = user
][, indx := NULL]
res
# user fruit cum_sum
# 1: 1234 apple 1
# 2: 1234 pear 2
# 3: 1234 apple 3
# 4: 1234 pear 4
# 5: 1234 orange 5
# 6: 1234 lemon 6
# 7: 1234 lime 7
# 8: 1234 apple 8
# 9: 1234 banana 9
# 10: 4758 lime 1
# 11: 4758 banana 2
# 12: 9584 apple 1
# 13: 9584 pear 2
# 14: 9584 orange 3
# 15: 9584 lemon 4
# 16: 9584 banana 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With