My sample data looks like this:
> gros id nr_oriz
> 1: 23 1 1
> 2: 16 1 2
> 3: 14 1 3
> 4: 15 1 4
> 5: 22 1 5
> 6: 30 1 6
> 7: 25 2 1
> 8: 10 2 2
> 9: 13 2 3
> 10: 17 2 4
> 11: 45 2 5
> 12: 25 4 1
> 13: 15 4 2
> 14: 20 4 3
> 15: 20 4 4
> 16: 20 4 5
where gros
is the depth for each soil horizon, id
is the profile number and nr_horiz
is soil horizon number. I need to create two columns: top and bottom, where top is the upper limit of horizon and bottom the lower limit. We have managed to obtain only the bottom values using:
topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)
but for top values we need somehow to offset the data for each id
and to calculate cumulative sum beginning from 0 and without the last value, like in this example:
gros id nr_oriz top bottom
1: 23 1 1 0 23
2: 16 1 2 23 39
3: 14 1 3 39 53
4: 15 1 4 53 68
5: 22 1 5 68 90
6: 30 1 6 90 120
7: 25 2 1 0 25
8: 10 2 2 25 35
9: 13 2 3 35 48
10: 17 2 4 48 65
11: 45 2 5 65 110
12: 25 4 1 0 25
13: 15 4 2 25 40
14: 20 4 3 40 60
15: 20 4 4 60 80
16: 20 4 5 80 100
Is there a simple solution for this, taking into account that the database is very large and we cannot do it manually (as we did with top
column in this sample).
You can just use ave
again, but on the "bottom" column and with a custom function:
topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))
As it appears you are using the data.table
package, you could modify your code to take advantage of data.table's syntax and performance. In order to calculate bottom
, you would simply do:
topsoil[, bottom := cumsum(gros), by = id]
Then to calculate top
:
topsoil[, top := c(0L, bottom[-.N]), by = id]
Or you can wrap them up in a single step similarly to how was illustrated by @akrun's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With