I have the following data frame:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7))
On each day, I would like a cumulative sum of unique values, taking only the most recent value for an entry that repeats. My expected output is as follows:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7), Sum=c(1, 3, 6, 10, 14, 18, 7))
Day ID Value Sum
1 1 A 1 1
2 1 B 2 3
3 1 C 3 6
4 1 D 4 10
5 1 A 5 14
6 1 B 6 18
7 2 B 7 7
where the 5th entry adds up values 2, 3, 4, 5 (because A repeats) and the 6th entry adds up values 3, 4, 5, and 6 (because both A and B repeat). The 7th entry restarts because it is a new day.
I don't think I can use cumsum() as it only accepts 1 parameter. I also don't want to keep a counter for each ID as I may have up to 100 unique IDs per day.
Any hints or help would be appreciated! Thank you!
You can difference the values by ID and Day and then use cumsum
:
library(data.table)
setDT(d)
d[, v_eff := Value - shift(Value, fill=0), by=.(Day, ID)]
d[, s := cumsum(v_eff), by=Day]
Day ID Value Sum v_eff s
1: 1 A 1 1 1 1
2: 1 B 2 3 2 3
3: 1 C 3 6 3 6
4: 1 D 4 10 4 10
5: 1 A 5 14 4 14
6: 1 B 6 18 4 18
7: 2 B 7 7 7 7
Base R analogue...
d$v_eff <- with(d, ave(Value, Day, ID, FUN = function(x) c(x[1], diff(x)) ))
d$s <- with(d, ave(v_eff, Day, FUN = cumsum))
Day ID Value Sum v_eff s
1 1 A 1 1 1 1
2 1 B 2 3 2 3
3 1 C 3 6 3 6
4 1 D 4 10 4 10
5 1 A 5 14 4 14
6 1 B 6 18 4 18
7 2 B 7 7 7 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With