Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group time by every n minutes in R

I have a dataframe with a lot of time series:

1   0:03    B   1
2   0:05    A   1
3   0:05    A   1
4   0:05    B   1
5   0:10    A   1
6   0:10    B   1
7   0:14    B   1
8   0:18    A   1
9   0:20    A   1
10  0:23    B   1
11  0:30    A   1

I want to group the time series into every 6 minutes and count the frequency of A and B:

1   0:06    A   2
2   0:06    B   2
3   0:12    A   1
4   0:12    B   1
5   0:18    A   1
6   0:24    A   1
7   0:24    B   1
8   0:18    A   1
9   0:30    A   1

Also, the class of the time series is character. What should I do?

like image 567
Yiqian Yang Avatar asked Jan 26 '26 15:01

Yiqian Yang


1 Answers

Here's an approach to convert times to POSIXct, cut the times by 6 minute intervals, then count.

First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.

library(tidyverse)
library(lubridate)

# sample data
d <- data.frame(t = paste0("2019-06-02 ", 
                           c("0:03","0:06","0:09","0:12","0:15",
                             "0:18","0:21","0:24","0:27","0:30"), 
                           ":00"),
                g = c("A","A","B","B","B"))

d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`

If you check the class of your new date column, you will see it is "POSIXct".

> class(d$t)
[1] "POSIXct" "POSIXt" 

Now that the data is in "POSIXct", you can cut it by minute intervals! We will add this new grouping factor as a new column called tc.

d$tc <- cut(d$t, breaks = "6 min")  
d
                     t g                  tc
1  2019-06-02 00:03:00 A 2019-06-02 00:03:00
2  2019-06-02 00:06:00 A 2019-06-02 00:03:00
3  2019-06-02 00:09:00 B 2019-06-02 00:09:00
4  2019-06-02 00:12:00 B 2019-06-02 00:09:00
5  2019-06-02 00:15:00 B 2019-06-02 00:15:00
6  2019-06-02 00:18:00 A 2019-06-02 00:15:00
7  2019-06-02 00:21:00 A 2019-06-02 00:21:00
8  2019-06-02 00:24:00 B 2019-06-02 00:21:00
9  2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00

Now you can group_by this new interval (tc) and your grouping column (g), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr provides count for this:

count(d, g, tc)
# A tibble: 7 x 3
  g     tc                      n
  <fct> <fct>               <int>
1 A     2019-06-02 00:03:00     2
2 A     2019-06-02 00:15:00     1
3 A     2019-06-02 00:21:00     1
4 B     2019-06-02 00:09:00     2
5 B     2019-06-02 00:15:00     1
6 B     2019-06-02 00:21:00     1
7 B     2019-06-02 00:27:00     2

If you run ?dplyr::count() in the console, you'll see that count(d, tc) is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n()).

like image 83
Rich Pauloo Avatar answered Jan 28 '26 05:01

Rich Pauloo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!