I am pretty new to R and wanted to calculate the cumulative standard deviation by group in R. I have a data frame D which has an ID for visitor and the corresponding time on page (top) spent in each page as below
ID top
v1 2.3
v1 4.8
v1 10.2
v2 16.2
v2 12.2
v2 14.3
v2 12.4
v3 8.2
v3 8.8
The output needs to look like this
ID top cum_sd
v1 2.3
v1 4.8 1.76
v1 10.2 4.03
v2 16.2
v2 12.2 2.82
v2 14.3 2.00
v2 12.4 1.15
v3 8.2
v3 8.8 0.42
Thank you for the help in advance.
We can use runSD
from TTR
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', we apply the runSD
on the 'top' column and assign (:=
) the output to create the 'cum_sd'.
library(data.table)
library(TTR)
setDT(df1)[, cum_sd := round(runSD(top, n=1, cumulative=TRUE),2) ,ID]
df1
# ID top cum_sd
#1: v1 2.3 NA
#2: v1 4.8 1.77
#3: v1 10.2 4.04
#4: v2 16.2 NA
#5: v2 12.2 2.83
#6: v2 14.3 2.00
#7: v2 12.4 1.87
#8: v3 8.2 NA
#9: v3 8.8 0.42
You can do it with base functions:
cumsd <- function(x) sapply(sapply(seq_along(x), head, x=x), sd)
df1$cum_sd <- ave(df1$top, df1$ID, FUN=cumsd)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With