Assuming in R, I have a data.frame with the first column representing the time (as POSIXct). The rest of the columns (e.g., columns 2) are numeric data.
I would like to group time into 3-minute intervals. Each interval will the the average of values that falls into that particular interval.
Right now, I have a for-loop that iterates through the time column and generate the interval on the fly. I am wondering if there's a more elegant way to accomplish the same thing?
Thanks in advance.
Derek
Grouping data by time intervals is very obvious when you come across Time-Series Analysis. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.
Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas .groupby () and .agg () functions. This tutorial explains several examples of how to use these functions in practice.
VLOOKUP, in its approximate match mode, allows you to group times into custom intervals of any size. Hi John, thanks for this helpful information. I have data taken every five minutes and I need to aggregate the hourly totals.
Solution #1 – Group Time with a Pivot Table. The quickest and easiest solution would be to use the Group feature in a Pivot Table. This solution is also the most limiting because you will only be able to group the times in 1 hour increments. The first step is to create a pivot table and add the Date field to the Rows area.
I think that a command like the following, would return a list of the values that fall into 3 minute intervals. (v
is the name of the dataframe and datecol
is the name of the date column)
library(plyr)
v<-data.frame(datecol=as.POSIXct(c(
"2010-01-13 03:02:38 UTC",
"2010-01-13 03:03:14 UTC",
"2010-01-13 03:05:52 UTC",
"2010-01-13 03:07:42 UTC",
"2010-01-13 03:09:38 UTC",
"2010-01-13 03:10:14 UTC",
"2010-01-13 03:12:52 UTC",
"2010-01-13 03:13:42 UTC",
"2010-01-13 03:15:42 UTC",
"2010-01-13 03:16:38 UTC",
"2010-01-13 03:18:14 UTC",
"2010-01-13 03:21:52 UTC",
"2010-01-13 03:22:42 UTC",
"2010-01-13 03:24:19 UTC",
"2010-01-13 03:25:19 UTC"
)), x = cumsum(runif(15)*10),y=cumsum(runif(15)*20))
dlply(v,.(cut(datecol,"3 min")),"[")
The zoo and xts packages excel at this and have copious documentation. Here is a pre-canned to.minutes3
, but I also used aggregate.zoo()
with custom functions doing the same by hand:
> library(xts)
> x <- xts(cumsum(abs(rnorm(20))), Sys.time()+60*(0:19))
> x
[,1]
2010-05-27 14:44:25 1.2870
2010-05-27 14:45:25 3.3187
2010-05-27 14:46:25 4.0976
2010-05-27 14:47:25 5.3304
2010-05-27 14:48:25 6.9415
2010-05-27 14:49:25 7.4508
2010-05-27 14:50:25 8.5281
2010-05-27 14:51:25 8.7145
2010-05-27 14:52:25 9.0120
2010-05-27 14:53:25 10.5063
2010-05-27 14:54:25 11.6312
2010-05-27 14:55:25 11.9813
2010-05-27 14:56:25 13.8883
2010-05-27 14:57:25 14.1696
2010-05-27 14:58:25 14.3269
2010-05-27 14:59:25 14.6768
2010-05-27 15:00:25 15.4926
2010-05-27 15:01:25 16.8408
2010-05-27 15:02:25 18.7739
2010-05-27 15:03:25 19.7815
> to.minutes3(x)
x.Open x.High x.Low x.Close
2010-05-27 14:44:25 1.2870 1.2870 1.2870 1.2870
2010-05-27 14:47:25 3.3187 5.3304 3.3187 5.3304
2010-05-27 14:50:25 6.9415 8.5281 6.9415 8.5281
2010-05-27 14:53:25 8.7145 10.5063 8.7145 10.5063
2010-05-27 14:56:25 11.6312 13.8883 11.6312 13.8883
2010-05-27 14:59:25 14.1696 14.6768 14.1696 14.6768
2010-05-27 15:02:25 15.4926 18.7739 15.4926 18.7739
2010-05-27 15:03:25 19.7815 19.7815 19.7815 19.7815
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With