I am trying to sum pieces of a series using data.table in r. The idea is that I define a start index and an end index as columns in the table, then make a third column for "sum of the series between start and end indexes."
series = c(1,2,3,4,5,6)
a = data.table(start=c(1,2,3),end=c(4,5,6))
a[,S := sum(series[start:end])]
a
Expected result:
start end S
1: 1 4 10
2: 2 5 14
3: 3 6 18
Actual result:
Warning messages:
1: In start:end : numerical expression has 3 elements: only the first used
2: In start:end : numerical expression has 3 elements: only the first used
> a
start end S
1: 1 4 10
2: 2 5 10
3: 3 6 10
What am I missing here? If I just do a[,S := start+end] the code executes as one would expect.
An option is to loop over the 'start', 'end' columns with Map, get the sequence (:) of the corresponding elements, get the sum and unlist, the list column to assign (:=) it to a new column
a[, S := unlist(Map(function(x, y) sum(x:y), start, end))]
-output
a
# start end S
#1: 1 4 10
#2: 2 5 14
#3: 3 6 18
The : is not vectorized for its operands i.e. it takes just a single operand on either side, and that is the reason it showed a warning
You can use the arithmetic series formula:
a[, S := (end - start + 1) * (start + end) / 2]
Gives:
start end S
1: 1 4 10
2: 2 5 14
3: 3 6 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With