Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time-based averaging (sliding window) of columns in a data.frame

Tags:

dataframe

r

I have a data.frame which has multiple columns. One of the columns is time and is thus non-decreasing. Rest of the columns contain observations recorded at the time given by the time specified in a certain row of the data.frame.

I want to select a window of time, say "x" seconds, and calculate the average (or for that matter any function) of the entries in some other columns in the same data.frame for that window.

Of course, because its a time based average, the number of entries in a window can vary depending upon the data. This is because the number of rows belonging to a certain time window can vary.

I have done this using a custom function, which creates a new column in the data.frame. The new column assigns a single number to all the entries in a time window. The number is unique across all the time windows. This essentially divides the data into groups based on the time windows. Then I use R's "aggregate" function to do calculate the mean.

I was just wondering if there is an existing R function that can do the grouping based on a time interval or if there is a better (cleaner) way to do this.

like image 763
nixbox Avatar asked Oct 20 '10 17:10

nixbox


1 Answers

Assuming your data.frame contains only numeric data, this is one way to do it using zoo/xts:

> Data <- data.frame(Time=Sys.time()+1:20,x=rnorm(20))
> xData <- xts(Data[,-1], Data[,1])
> period.apply(xData, endpoints(xData, "seconds", 5), colMeans)
                           [,1]
2010-10-20 13:34:19 -0.20725660
2010-10-20 13:34:24 -0.01219346
2010-10-20 13:34:29 -0.70717312
2010-10-20 13:34:34  0.09338097
2010-10-20 13:34:38 -0.22330363

EDIT: using only base R packages. The means are the same, but the times are slightly different because endpoints starts the 5-second interval with the first observation. The code below groups on 5-second intervals starting with seconds = 0.

> nSeconds <- 5
> agg <- aggregate(Data[,-1], by=list(as.numeric(Data$Time) %/% nSeconds), mean)
> agg[,1] <- .POSIXct(agg[,1]*nSeconds)  # >= R-2.12.0 required for .POSIXct
like image 135
Joshua Ulrich Avatar answered Oct 06 '22 00:10

Joshua Ulrich