I have a data.table in R, and I would like to apply rolling sum by group to it. But the problem is the group length is not the same and when the rollapply function reaches the shorter group, it will encounter an error. Is there a way to solve this, except for-loops?
The following is a simple example to illustrate the problem.
DT <- data.table(id = c(rep("A", 6), rep("B", 2), rep("C", 8)),
val = c(1:6, 1:2, 1:8))
> DT
id val
1: A 1
2: A 2
3: A 3
4: A 4
5: A 5
6: A 6
7: B 1
8: B 2
9: B 1
10: B 2
11: B 3
12: B 4
13: B 5
14: B 6
15: C 7
16: C 8
Rolling sum of 4 numbers, using rollapplyr()
DT[, cum.sum := rollapplyr(val, width = 4, FUN = sum, fill = NA), by = id]
But this will give me an error
Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument
And the output is
> DT
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 NA
13: C 5 NA
14: C 6 NA
15: C 7 NA
16: C 8 NA
Ideally, the output should be
> DT
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 10
13: C 5 14
14: C 6 18
15: C 7 22
16: C 8 26
We can do
DT[, cum.sum := Reduce(`+`, shift(val, 0:3)), by=id]
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 10
13: C 5 14
14: C 6 18
15: C 7 22
16: C 8 26
I knew I'd seen this somewhere before - possibly a duplicate?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With