I have some data which is formatted in the following way:
time count
00:00 17
00:01 62
00:02 41
So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:
time count
00:00-00:15 148
00:16-00:30 284
I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.
I'd really appreciate some help!!
Thank you very much!
For data that's in POSIXct format, you can use the cut
function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R
and with the dplyr
and data.table
packages.
First, create some fake data:
set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
count=sample(1:50, 100, replace=TRUE))
Base R
cut
the data into 15 minute groups:
dat$by15 = cut(dat$time, breaks="15 min")
time count by15 1 2016-05-01 00:00:00 22 2016-05-01 00:00:00 2 2016-05-01 00:01:00 11 2016-05-01 00:00:00 3 2016-05-01 00:02:00 31 2016-05-01 00:00:00 ... 98 2016-05-01 01:37:00 20 2016-05-01 01:30:00 99 2016-05-01 01:38:00 29 2016-05-01 01:30:00 100 2016-05-01 01:39:00 37 2016-05-01 01:30:00
Now aggregate
by the new grouping column, using sum
as the aggregation function:
dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
by15 count 1 2016-05-01 00:00:00 312 2 2016-05-01 00:15:00 395 3 2016-05-01 00:30:00 341 4 2016-05-01 00:45:00 318 5 2016-05-01 01:00:00 349 6 2016-05-01 01:15:00 397 7 2016-05-01 01:30:00 341
dplyr
library(dplyr)
dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
summarise(count=sum(count))
data.table
library(data.table)
dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]
UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1
. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct
is denominated in seconds. The as.POSIXct(as.character(...))
is because cut
returns a factor and this just converts it back to date-time so that we can do math on it.
If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14
.
If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With