Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a time interval of 15 minutes from minutely data in R?

Tags:

time

format

r

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:

time           count
00:00-00:15    148   
00:16-00:30    284

I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.

I'd really appreciate some help!!

Thank you very much!

like image 792
adrian1121 Avatar asked Apr 24 '16 19:04

adrian1121


1 Answers

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))

Base R

cut the data into 15 minute groups:

dat$by15 = cut(dat$time, breaks="15 min")
                   time count                by15
1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
...
98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
100 2016-05-01 01:39:00    37 2016-05-01 01:30:00

Now aggregate by the new grouping column, using sum as the aggregation function:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
                 by15 count
1 2016-05-01 00:00:00   312
2 2016-05-01 00:15:00   395
3 2016-05-01 00:30:00   341
4 2016-05-01 00:45:00   318
5 2016-05-01 01:00:00   349
6 2016-05-01 01:15:00   397
7 2016-05-01 01:30:00   341

dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))

data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

like image 167
eipi10 Avatar answered Oct 26 '22 05:10

eipi10