Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing sequences in r using data.table

Tags:

r

data.table

I am trying to sum pieces of a series using data.table in r. The idea is that I define a start index and an end index as columns in the table, then make a third column for "sum of the series between start and end indexes."

series = c(1,2,3,4,5,6)
a = data.table(start=c(1,2,3),end=c(4,5,6))
a[,S := sum(series[start:end])]
a

Expected result:

   start end  S
1:     1   4 10
2:     2   5 14
3:     3   6 18

Actual result:

Warning messages:
1: In start:end : numerical expression has 3 elements: only the first used
2: In start:end : numerical expression has 3 elements: only the first used
> a
   start end  S
1:     1   4 10
2:     2   5 10
3:     3   6 10

What am I missing here? If I just do a[,S := start+end] the code executes as one would expect.

like image 914
Sinnombre Avatar asked May 19 '26 11:05

Sinnombre


2 Answers

An option is to loop over the 'start', 'end' columns with Map, get the sequence (:) of the corresponding elements, get the sum and unlist, the list column to assign (:=) it to a new column

a[, S := unlist(Map(function(x, y) sum(x:y), start, end))]

-output

a
#   start end  S
#1:     1   4 10
#2:     2   5 14
#3:     3   6 18

The : is not vectorized for its operands i.e. it takes just a single operand on either side, and that is the reason it showed a warning

like image 160
akrun Avatar answered May 21 '26 01:05

akrun


You can use the arithmetic series formula:

a[, S := (end - start + 1) * (start + end) / 2]

Gives:

   start end  S
1:     1   4 10
2:     2   5 14
3:     3   6 18
like image 41
Ritchie Sacramento Avatar answered May 21 '26 03:05

Ritchie Sacramento