Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hourly sums with dplyr with zeros for empty hours

Tags:

r

dplyr

I have a dataset similar to the format of "my_data" below, where each line is a single count of an event. I want to obtain a summary of how many events happen in every hour. I would like to have every hour with no events be included with a 0 for its "hourly_total" value.

I can achieve this with dplyr as shown, but the empty hours are dropped instead of being set to 0.

Thank you!

set.seed(123)
library(dplyr)
library(lubridate)

latemail <- function(N, st="2012/01/01", et="2012/1/31") {
       st <- as.POSIXct(as.Date(st))
       et <- as.POSIXct(as.Date(et))
       dt <- as.numeric(difftime(et,st,unit="sec"))
       ev <- sort(runif(N, 0, dt))
       rt <- st + ev
   }

my_data <- data_frame( fake_times = latemail(25),
                   count = 1)

my_data %>% group_by( rounded_hour = floor_date(fake_times, unit = "hour")) %>%
            summarise( hourly_total = sum(count))
like image 943
Michael Avatar asked Mar 17 '23 18:03

Michael


1 Answers

Assign your counts to an object

counts <- my_data %>% group_by( rounded_hour = floor_date(fake_times, unit = "hour")) %>%
    summarise( hourly_total = sum(count))

Create a data frame with all the necessary hours

complete_data = data.frame(hour = seq(floor_date(min(my_data$fake_times), unit = "hour"),
                                      floor_date(max(my_data$fake_times), unit = "hour"),
                                      by = "hour"))

Join to it and fill in the NAs.

complete_data %>% group_by( rounded_hour = floor_date(hour, unit = "hour")) %>%
    left_join(counts) %>%
    mutate(hourly_total = ifelse(is.na(hourly_total), 0, hourly_total))
like image 109
Gregor Thomas Avatar answered Mar 20 '23 15:03

Gregor Thomas