aggregate 1-minute data into 5-minute average data

Question

My question here is to aggregate the data collected at every 1-minute into 5-minute average.

DeviceTime         Concentration
6/20/2013 11:13       
6/20/2013 11:14
6/20/2013 11:15
6/20/2013 11:16
6/20/2013 11:17
6/20/2013 11:18
6/20/2013 11:19
6/20/2013 11:20
6/20/2013 11:21
6/20/2013 11:22
6/20/2013 11:23
6/20/2013 11:24
6/20/2013 11:25
6/20/2013 11:26
6/20/2013 11:27
6/20/2013 11:28

...

The result I want is like:

DeviceTime             Concentration
6/20/2013 11:15
6/20/2013 11:20
6/20/2013 11:25
6/20/2013 11:30
6/20/2013 11:35
...

The 5-minute average is just the simple average over the concentration in the past five minutes.

lukeA · Accepted Answer

Using the dplyr package and assuming, your data is stored in a data frame named df:

require(dplyr)
df %>%
  group_by(DeviceTime = cut(DeviceTime, breaks="5 min")) %>%
  summarize(Concentration = mean(Concentration))

Mark Rajcok · Answer

If your data doesn't start on a nice 5-minute wall clock boundary (as shown in your sample data – 11:13), note that cut() will create breakpoints based on the first timestamp it finds. This probably isn't what we normally want. Indeed, your sample output indicates this is not what you want.

Here's what cut() does:

df <- read.table(header=TRUE, sep=",", stringsAsFactors=FALSE, text="
DeviceTime,Concentration
6/20/2013 11:13,1
6/20/2013 11:14,1
6/20/2013 11:15,2
6/20/2013 11:16,2
6/20/2013 11:17,2
6/20/2013 11:18,2
6/20/2013 11:19,2
6/20/2013 11:20,3
6/20/2013 11:21,3
6/20/2013 11:22,3
6/20/2013 11:23,3
6/20/2013 11:24,3
6/20/2013 11:25,4")
df$DeviceTime <- as.POSIXct(df$DeviceTime, format="%m/%d/%Y %H:%M")

cut(df$DeviceTime, breaks="5 min")
 [1] 2013-06-20 11:13:00 2013-06-20 11:13:00 2013-06-20 11:13:00
 [4] 2013-06-20 11:13:00 2013-06-20 11:13:00 2013-06-20 11:18:00
 [7] 2013-06-20 11:18:00 2013-06-20 11:18:00 2013-06-20 11:18:00
[10] 2013-06-20 11:18:00 2013-06-20 11:23:00 2013-06-20 11:23:00
[13] 2013-06-20 11:23:00

means <- aggregate(df["Concentration"], 
                   list(fiveMin=cut(df$DeviceTime, "5 mins")),
                   mean)
means
              fiveMin Concentration
1 2013-06-20 11:13:00      1.600000
2 2013-06-20 11:18:00      2.600000
3 2013-06-20 11:23:00      3.333333

Notice that the first row of means (the 11:13:00 entry) is the mean of the first 5 rows of df, which have times of 11:13 to 11:17 -- i.e., up until just before the next cut/break point of 11:18.

You'll get the same result with dplyr (i.e., @lukeA's answer) if you use cut():

df %>%
  group_by(DeviceTime = cut(DeviceTime, breaks="5 min")) %>%
  summarize(Concentration = mean(Concentration))
Source: local data frame [3 x 2]

           DeviceTime Concentration
1 2013-06-20 11:13:00      1.600000
2 2013-06-20 11:18:00      2.600000
3 2013-06-20 11:23:00      3.333333

The xts package seems to break by wall clock time:

require(xts)
df.xts <- xts(df$Concentration, df$DeviceTime)
means.xts <- period.apply(df.xts, endpoints(df.xts, "mins", k=5), mean)
means.xts
                    [,1]
2013-06-20 11:14:00    1
2013-06-20 11:19:00    2
2013-06-20 11:24:00    3
2013-06-20 11:25:00    4

The time values are always the last time entry found in the 5-min window. You can round the time index column up the the next 5-min boundary with align.time(), if you want to report the times of the end of the periods:

means.rounded <- align.time(means.xts, 5*60)
means.rounded
                    [,1]
2013-06-20 11:15:00    1
2013-06-20 11:20:00    2
2013-06-20 11:25:00    3
2013-06-20 11:30:00    4

You can also round down, if you want to report the times of the beginning of the periods. But you'll need to define your own function first (which I found on Cross Validated):

align.time.down = function(x,n) {
    index(x) = index(x) - n
    align.time(x,n)
}
means.rounded.down <- align.time.down(means.xts, 5*60)
means.rounded.down
                    [,1]
2013-06-20 11:10:00    1
2013-06-20 11:15:00    2
2013-06-20 11:20:00    3
2013-06-20 11:25:00    4

Another solution, that doesn't use the xts package, but rather floor(), is as follows:

df$DeviceTimeFloor <- as.POSIXct(floor(as.numeric(df$DeviceTime) / (5 * 60)) * (5 * 60), origin='1970-01-01')
meansFloor <- aggregate(Concentration ~ DeviceTimeFloor, df, mean)
meansFloor
      DeviceTimeFloor Concentration
1 2013-06-20 11:10:00             1
2 2013-06-20 11:15:00             2
3 2013-06-20 11:20:00             3
4 2013-06-20 11:25:00             4

I prefer to report the start time of the 5-minute interval – floor() is good for this. Because, if I were to report aggregates by hour, I would expect a timestamp of 2013-06-20 11:00:00 to contain data for the period 11:00:00 - 11:59:59 not 10:00:00 - 10:59:59.

If you prefer to report the end time of the intervals, ceiling() can be used instead of floor(). But note that timestamps 11:01 - 11:05 will be converted to (and hence grouped at) 11:05 by ceiling(). In contrast, floor() converts 11:00 - 11:04 to 11:00.

So they each group a different set of observations. The xts package will group the same set of observations as floor(), but it will report the last timestamp of the last observation in the period.

Marek Lahoda · Answer

I'd say the easiest and cleanest way to do this is using the lubridate and dplyr packages.

library(lubridate)  # for working with dates
library(dplyr)      # for manipulating data

df$DeviceTime5min <- floor_date(df$DeviceTime, "5 mins")
df_5min <- df %>% group_by(DeviceTime5min) %>% summarize(mean(Concentration))

Only problem here is that it works just for values, that fit into an hour ... that is: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60 min. But for these it works perfect :-)

aggregate 1-minute data into 5-minute average data

Tags:

date

r

Vicki1227

3 Answers

lukeA

Mark Rajcok

Marek Lahoda

Recent Activity

Donate For Us

aggregate 1-minute data into 5-minute average data

Tags:

date

r

Vicki1227

3 Answers

lukeA

Mark Rajcok

Marek Lahoda

Related questions

Recent Activity

Donate For Us