Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - cumulative sum by condition

Tags:

r

So I have a dataset which simplified looks something like this:

Year    ID     Sum
2009    999    100
2009    123     85
2009    666    100
2009    999    100
2009    123     90
2009    666     85
2010    999    100
2010    123    100
2010    666     95
2010    999     75
2010    123    100
2010    666     85

I'd like to add a column with the cumulative sum, by year and ID. Like this:

Year     ID     Sum    Cum.Sum
2009    999     100        100
2009    123      85         85  
2009    666     100        100  
2009    999     100        200
2009    123      90        175
2009    666      85        185
2010    999     100        100
2010    123     100        100
2010    666      95         95
2010    999      75        175
2010    123     100        200
2010    666      85        180

I think this should be pretty straight-forward, but somehow I haven't been able to figure it out. How do I do this? Thanks for the help!

like image 260
Alex T Avatar asked Sep 20 '13 20:09

Alex T


2 Answers

Using data.table:

require(data.table)
DT <- data.table(DF)
DT[, Cum.Sum := cumsum(Sum), by=list(Year, ID)]

    Year  ID Sum Cum.Sum
 1: 2009 999 100     100
 2: 2009 123  85      85
 3: 2009 666 100     100
 4: 2009 999 100     200
 5: 2009 123  90     175
 6: 2009 666  85     185
 7: 2010 999 100     100
 8: 2010 123 100     100
 9: 2010 666  95      95
10: 2010 999  75     175
11: 2010 123 100     200
12: 2010 666  85     180
like image 102
Arun Avatar answered Nov 05 '22 16:11

Arun


Another way

1) use ddply to sum a variable by group (similar to SQL group by)

X <- ddply ( dataset, .(Year,ID), sum)

2) merge the result with dataset

Y <- merge( dataset, X, by=('Year','ID')
like image 36
holgrich Avatar answered Nov 05 '22 17:11

holgrich