Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you group an R dataframe by time interval and column value?

Tags:

r

I have a dataset that is a list of the times that events happen at specific facilities:

> head(facility_events);
facility_id          event_time
1   20248 2018-01-01 00:00:01
2   12445 2018-01-01 00:00:04
3   20248 2018-01-01 00:00:08
4   17567 2018-01-01 00:00:47
5   17567 2018-01-01 00:03:50
6   10459 2018-01-01 00:04:01

I would like to generate a dataframe with an aggregate sum by grouping the data by facility and also grouping the events into 3 minute intervals. Output would look something like this:

count facility interval
2      20248   0 
1      12445   0
1      17567   0
1      17567   1
1      10459   1

How do you accomplish this in R?

like image 749
lreeder Avatar asked Dec 13 '22 18:12

lreeder


2 Answers

You can use tidyverse with lubridate for this:

df <- data.frame(facility_id = c(20248, 12445, 20248, 17567, 17567, 10459),
                 event_time = as.POSIXct(c("2018-01-01 00:00:01", "2018-01-01 00:00:04", "2018-01-01 00:00:08", "2018-01-01 00:00:47", "2018-01-01 00:03:50", "2018-01-01 00:04:01")))

library(tidyverse)

df %>%
    mutate(interval = lubridate::minute(event_time) %/% 3) %>%
    group_by(facility_id, interval) %>%
    summarise(count = n())

# A tibble: 5 x 3
# Groups: facility_id [?]
  facility_id interval count
        <dbl>    <int> <int>
1       10459        1     1
2       12445        0     1
3       17567        0     1
4       17567        1     1
5       20248        0     2
like image 109
m0nhawk Avatar answered Dec 18 '22 00:12

m0nhawk


Here is a solution with data.table. Same logic:

  • Creates group of interval
  • summarise (count) by this group

This is a one-liner with data.table concise syntax.

df <- data.frame(facility_id = c(20248, 12445, 20248, 17567, 17567, 10459),
                 event_time = as.POSIXct(c("2018-01-01 00:00:01", "2018-01-01 00:00:04", "2018-01-01 00:00:08", "2018-01-01 00:00:47", "2018-01-01 00:03:50", "2018-01-01 00:04:01")))

library(data.table)

setDT(df)
df[, .(count = .N), by = .(facility_id, interval= minute(event_time) %/% 3)]
#>    facility_id interval count
#> 1:       20248        0     2
#> 2:       12445        0     1
#> 3:       17567        0     1
#> 4:       17567        1     1
#> 5:       10459        1     1

Created on 2018-01-14 by the reprex package (v0.1.1.9000).

like image 44
cderv Avatar answered Dec 17 '22 22:12

cderv