I have a group of data in the format:
ID Minutes Value
xxxx 118 3
xxxx 121 4
xxxx 122 3
yyyy 122 6
xxxx 123 4
yyyy 123 8
... ... ....
Each ID is a patient and each value is, say, blood pressure for that minute. I would like to create a rolling average for the 60 minutes before and 60 minutes after each point. However - as you can see, there are missing minutes (so I cannot merely use row numbers) and I would like to create average for each unique ID (so the average for ID xxxx cannot include values assigned to ID yyyy). It sounds like rollapply or rollingstat might be options, but have had little success trying to piece this together...
Please let me know if further clarity is needed.
You can easily fill in the missing Minutes (Value will be set to NA), then use rollapply
library(data.table)
library(zoo)
## Convert to data.table
DT <- data.table(DF, key=c("IDs", "Minutes"))
## Missing Minutes will be added in. Value will be set to NA.
DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]
## Run your function
DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
You can do it all in one shot:
## Convert your DF to a data.able
DT <- data.table(DF, key=c("IDs", "Minutes"))
## Compute rolling means, with on-the-fly padded minutes
DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][,
rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
An alternative approach that uses tidyr/dplyr
instead of data.table
and RcppRoll
instead of zoo
:
library(dplyr)
library(tidyr)
library(RcppRoll)
d %>%
group_by(ID) %>%
# add rows for unosberved minutes
complete(Minutes = full_seq(Minutes, 1)) %>%
# RcppRoll::roll_mean() is written in C++ for speed
mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
# keep only the rows that were originally observed
filter(!is.na(Value))
data
d <- data_frame(
ID = rep(1:3, each = 5),
Minutes = rep(c(1, 30, 60, 120, 200), 3),
Value = rpois(15, lambda = 10)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With