Adding missing date values in a data frame with multiple observation periods

Question

Thanks in advance.

I am trying to add missing date values that were not included in a observation period for three different individuals.

My data look like this:

 IndID       Date Event Number Percent
1   P01 2011-03-04     1      2   0.390
2   P01 2011-03-11     1      2   0.975
3   P01 2011-03-13     0      9   0.795
4   P01 2011-03-14     0     10   0.516
5   P01 2011-03-15     0      1   0.117
6   P01 2011-03-17     0      7   0.093

IndID is the individual ID (P01, P03, P06). Date is obviously the Date. Event is a binary variable indicating whether an event occurred (0 = no and 1 = yes).
Columns Number and Percent are not directly relevant, but need to be preserved and are thus included here.

My sample data frame (PostData) is included below using dput.

For each IndID the first and last Date are the beginning and end of an observation period respectively, within which there are missing dates. Here, my goal is to add the missing dates for each individual and add a 0 in the Event column. The other columns (Number and Percent) can remain blank.

This post has been useful, but lacks info on my main problem - multiple individuals.

The observation period for each individual is from min(PostData$Date) to max(PostData$Date). I have been attempting to create a complete Date sequence for each individual and then merge it with the existing data frame within a for loop. There is surely a better idea.

Any suggestions are appreciated.

PostData <-structure(list(IndID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
  5L, 5L), .Label = c("P01", "P02", "P03", "P05", "P06", "P07", 
  "P08", "P09", "P10", "P11", "P12", "P13"), class = "factor"), 
  Date = structure(c(1299196800, 1299801600, 1299974400, 1300060800, 
  1300147200, 1300320000, 1300406400, 1310083200, 1310169600, 
  1310515200, 1310774400, 1310947200, 1311033600, 1311292800, 
  1311552000, 1323129600, 1323388800, 1323648000, 1323993600, 
  1324080000, 1324166400, 1324339200, 1327622400, 1327795200, 
  1327881600), class = c("POSIXct", "POSIXt"), tzone = "GMT"), 
  Event = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 
  0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L), Number = c(2L, 
  2L, 9L, 10L, 1L, 7L, 5L, 9L, 1L, 4L, 5L, 2L, 0L, 1L, 10L, 
  5L, 0L, 6L, 5L, 10L, 9L, 4L, 4L, 8L, 1L), Percent = c(0.39, 
  0.975, 0.795, 0.516, 0.117, 0.093, 0.528, 0.659, 0.308, 0.055, 
  0.185, 0.761, 0.132, 0.676, 0.368, 0.383, 0.272, 0.113, 0.974, 
  0.696, 0.941, 0.751, 0.758, 0.29, 0.15)), .Names = c("IndID", 
  "Date", "Event", "Number", "Percent"), row.names = c(NA, 25L), 
  class = "data.frame")

thelatemail · Accepted Answer

A base R version:

do.call(rbind,
  by(
    PostData,
    PostData$IndID,
    function(x) {
      out <- merge(
        data.frame(
          IndID=x$IndID[1],
          Date=seq.POSIXt(min(x$Date),max(x$Date),by="1 day")
        ),
        x,
        all.x=TRUE
      )
      out$Event[is.na(out$Event)] <- 0
      out
    }  
  )
)

Result:

       IndID       Date Event Number Percent
P01.1    P01 2011-03-04     1      2   0.390
P01.2    P01 2011-03-05     0     NA      NA
P01.3    P01 2011-03-06     0     NA      NA
P01.4    P01 2011-03-07     0     NA      NA
P01.5    P01 2011-03-08     0     NA      NA
P01.6    P01 2011-03-09     0     NA      NA
P01.7    P01 2011-03-10     0     NA      NA
P01.8    P01 2011-03-11     1      2   0.975
<<etc>>

vrajs5 · Answer

Try this.. This will add missing Dates with proper IDs and remaining fields as 0

library(data.table)
library(plyr)
dtPostData = data.table(PostData)
minmaxTab = dtPostData[,list(minDate=min(Date),maxDate=max(Date)),by=IndID]

df = lapply(1:nrow(minmaxTab),function(x) {
  temp = seq(minmaxTab$minDate[x],minmaxTab$maxDate[x],by=24*60*60) 
  temp = temp[!(temp %in% dtPostData[IndID == minmaxTab$IndID[x],]$Date)]
  data.table(IndID = minmaxTab$IndID[x], Date = temp, Event = 0, Number = 0, Percent = 0)
})

df <- ldply(x, data.frame)
df

#Results
   IndID       Date Event Number Percent
1    P01 2011-03-05     0      0       0
2    P01 2011-03-06     0      0       0
3    P01 2011-03-07     0      0       0
4    P01 2011-03-08     0      0       0
5    P01 2011-03-09     0      0       0
6    P01 2011-03-10     0      0       0
7    P01 2011-03-12     0      0       0
8    P01 2011-03-16     0      0       0
9    P03 2011-07-10     0      0       0

Adding missing date values in a data frame with multiple observation periods

Tags:

date

merge

for-loop

r

sequence

B. Davis

2 Answers

thelatemail

vrajs5

Recent Activity

Donate For Us

Adding missing date values in a data frame with multiple observation periods

Tags:

date

merge

for-loop

r

sequence

B. Davis

2 Answers

thelatemail

vrajs5

Related questions

Recent Activity

Donate For Us