Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional rolling mean (moving average) on irregular time series

I have a group of data in the format:

ID    Minutes Value
xxxx  118     3 
xxxx  121     4 
xxxx  122     3 
yyyy  122     6 
xxxx  123     4 
yyyy  123     8 
...   ...     .... 

Each ID is a patient and each value is, say, blood pressure for that minute. I would like to create a rolling average for the 60 minutes before and 60 minutes after each point. However - as you can see, there are missing minutes (so I cannot merely use row numbers) and I would like to create average for each unique ID (so the average for ID xxxx cannot include values assigned to ID yyyy). It sounds like rollapply or rollingstat might be options, but have had little success trying to piece this together...

Please let me know if further clarity is needed.

like image 908
user3239045 Avatar asked Jan 27 '14 03:01

user3239045


2 Answers

You can easily fill in the missing Minutes (Value will be set to NA), then use rollapply

library(data.table)
library(zoo)

## Convert to data.table
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Missing Minutes will be added in. Value will be set to NA. 
DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]

## Run your function
DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]

Alternatively, you don't need to keep the 'padded' Minutes / NA Values:

You can do it all in one shot:

## Convert your DF to a data.able
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Compute rolling means, with on-the-fly padded minutes
DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][, 
  rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
like image 135
Ricardo Saporta Avatar answered Oct 16 '22 09:10

Ricardo Saporta


An alternative approach that uses tidyr/dplyr instead of data.table and RcppRoll instead of zoo:

library(dplyr)
library(tidyr)
library(RcppRoll)

d %>% 
  group_by(ID) %>%
  # add rows for unosberved minutes
  complete(Minutes = full_seq(Minutes, 1)) %>%
  # RcppRoll::roll_mean() is written in C++ for speed 
  mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
  # keep only the rows that were originally observed
  filter(!is.na(Value))

data

d <- data_frame(
  ID = rep(1:3, each = 5),
  Minutes = rep(c(1, 30, 60, 120, 200), 3),
  Value = rpois(15, lambda = 10)
)
like image 23
davechilders Avatar answered Oct 16 '22 09:10

davechilders