Fill dataframe dates by variable in R

Question

I have a dummy dataset with 10 Hospitals, with a record of how many jobs there are for a specific date at that Hospital. The dates are taken weekly, and missing dates mean 0 jobs that week.

set.seed(2020)

df1 <- data.frame(
Date = as.Date(sample( as.numeric(as.Date('2011-01-01')): as.numeric(as.Date('2013-04-14')), 10, replace = T), origin = '1970-01-01'),
Hospital = sample(1:10,replace=T),
Jobs = rpois(10,2))

I would like to fill in the missing dates (taken weekly) for each Hospital, so there are 120 entries for each Hospital (as there are 120 weeks between 2011-01-01 and 2013-04-14), with the 'Jobs' variable assigned to 0 for the new dates. Hence outputting a dataframe with 1200 rows (10 Hospitals each with 120 weeks of entries).

Note: I have tried a solution along these lines: R fill missing dates by category but it only fills in the missing dates between the min and max that are already in the data and not for the dates defined above. I have also tried adding in start and end dates into the data manually for each Hosptial, applying the solution, then removing them again but this does not work as intended.

starja · Accepted Answer

Is your date for one week always the same weekday? Your example data draws the weekday randomly. I have a solution that works, but only if the weekday of the date is always the same. If this is not the case, you would have to do a bit more work to clean your input data.

Generate test data taken on the same weekday:

set.seed(2020)

df1 <- data.frame(
  Date = as.Date(sample(c(as.numeric(as.Date('2011-01-01')),
                          as.numeric(as.Date('2011-12-17')),
                          as.numeric(as.Date('2012-04-21')),
                          as.numeric(as.Date('2012-09-15')),
                          as.numeric(as.Date('2011-04-16')),
                          as.numeric(as.Date('2013-04-14'))), 10, replace = T),
                 origin = '1970-01-01'),
  Hospital = sample(1:10,replace=T),
  Jobs = rpois(10,2))

Then, generate a data.frame with all the desired dates (starting at 2011-01-01) for all hospitals:

date_df <- data.frame(Date = rep(seq(as.Date("2011/01/01"), by = "week",
                                 length.out = 120),
                                 times = 10),
                      Hospital = rep(1:10, each = 120))

Now, you can join the two data.frames and the right_join returns all rows from the second data.frame, so you have all dates covered. Then, you replace the NAs by 0:

library(dplyr)
df_join <- right_join(df1, date_df) %>% 
  mutate(Jobs = ifelse(is.na(Jobs), 0, Jobs))

Fill dataframe dates by variable in R

Tags:

dataframe

r

user553480

1 Answers

starja

Recent Activity

Donate For Us

Fill dataframe dates by variable in R

Tags:

dataframe

r

user553480

1 Answers

starja

Related questions

Recent Activity

Donate For Us