Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate offset cumulative sum begining with 0 for each group

Tags:

r

My sample data looks like this:

>         gros id nr_oriz
>      1:   23  1       1
>      2:   16  1       2
>      3:   14  1       3
>      4:   15  1       4
>      5:   22  1       5
>      6:   30  1       6
>      7:   25  2       1
>      8:   10  2       2
>      9:   13  2       3
>     10:   17  2       4
>     11:   45  2       5
>     12:   25  4       1
>     13:   15  4       2
>     14:   20  4       3
>     15:   20  4       4
>     16:   20  4       5

where gros is the depth for each soil horizon, id is the profile number and nr_horiz is soil horizon number. I need to create two columns: top and bottom, where top is the upper limit of horizon and bottom the lower limit. We have managed to obtain only the bottom values using:

topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)

but for top values we need somehow to offset the data for each id and to calculate cumulative sum beginning from 0 and without the last value, like in this example:

    gros id nr_oriz top bottom
 1:   23  1       1   0     23
 2:   16  1       2  23     39
 3:   14  1       3  39     53
 4:   15  1       4  53     68
 5:   22  1       5  68     90
 6:   30  1       6  90    120
 7:   25  2       1   0     25
 8:   10  2       2  25     35
 9:   13  2       3  35     48
10:   17  2       4  48     65
11:   45  2       5  65    110
12:   25  4       1   0     25
13:   15  4       2  25     40
14:   20  4       3  40     60
15:   20  4       4  60     80
16:   20  4       5  80    100

Is there a simple solution for this, taking into account that the database is very large and we cannot do it manually (as we did with top column in this sample).

like image 981
Rosca Bogdan Avatar asked Jan 08 '23 22:01

Rosca Bogdan


1 Answers

You can just use ave again, but on the "bottom" column and with a custom function:

topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))

As it appears you are using the data.table package, you could modify your code to take advantage of data.table's syntax and performance. In order to calculate bottom, you would simply do:

topsoil[, bottom := cumsum(gros), by = id]

Then to calculate top:

topsoil[, top := c(0L, bottom[-.N]), by = id]

Or you can wrap them up in a single step similarly to how was illustrated by @akrun's answer.

like image 79
Joshua Ulrich Avatar answered Jan 30 '23 20:01

Joshua Ulrich