Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use the lubridate package to calculate the number of months between two date vectors where one of the vectors has NA values?

Tags:

r

I've read the lubridate package manual and have queried Stack Overflow with a variety of permutations of my question but have come up with no answer to my specific problem.

What I'm trying to do is calculate age in months at time of event as the difference between date of birth and some specific event date.

As such, I imported a SAS dataset using the sas7bdat package and converted my SAS date variables (DOB and Event) to R objects using the following code:

df$DOB <- as.Date(df$DOB, origin="1960-01-01")
df$DOB1 <- ymd(df$DOB)

And same thing for the Event variable:

df$Event <- as.Date(df$Event, origin="1960-01-01")
df$Event1 <- ymd(df$Event)

However, there are some NA values for DOB. So, for the following code which I want to use to calculate age (in months).

df$interval <- new_interval(df$DOB1,df$Event1)
df$Age1 <- df$interval %/% months(1)

I'm receiving the error:

Error in est[start + est * per < end] <- est[start + est * per < end] + : NAs are not allowed in subscripted assignments

What am I doing wrong? I've tried an if/else function but perhaps used it incorrectly.

(Note: For the SAS programmers out there, I'm trying to produce the same results as the following function:

IF DOB ne . THEN Tage=Floor(intck('month',DOB,Event)-(Day(Event)<Day(DOB)));
like image 575
mcjudd Avatar asked Aug 18 '14 18:08

mcjudd


3 Answers

Simple example using lubridate package

library(lubridate)
date1='20160101'
date2='20160501'
x=interval(ymd(date1),ymd(date2))
x= x %/% months(1)
print(x)
# answer : 4

or follows is same:

x=as.period(x) %>% month()
print(x)
# answer : 4
like image 165
hyunwoo jeong Avatar answered Oct 10 '22 02:10

hyunwoo jeong


Well so I give all credit for this answer to my talented work colleague. I neglected to include a reproducible example because whenever I would write a simple approximation of my problem, the df$Age1 <- df$interval %/% months(1) always worked! This left me totally stumped. It wasn't until I actually ran the code on my dataframe of 650,000+ birthdates and event dates that the error message...

Error in est[start + est * per < end] <- est[start + est * per < end] + : NAs are not allowed in subscripted assignments

... would even come up! My colleague had the idea to process this calculation iteratively with the following function:

df$Age1 = rep(NA, nrow(df))
for (i in 1:nrow(df)) {
   df$Age1[i]<- df$interval[i] %/% months(1)
                      }
df$Age1[1:15]

Using my dataframe, it became plain to see that this calculation got hung up on row 13!

> df$interval[13]
[1] 1995-10-31 19:00:00 EST--1996-05-26 20:00:00 EDT

So we aren't certain, but maybe the fact that the df$DOB[13] is 10/31 is screwing it up. This sort of problem with the lubridate package has been reported before (i.e., lubridate not being able to divide intervals by a period when one of the dates is at the end of the month):

https://github.com/hadley/lubridate/issues/235

The way we came to a solution was by using as.period and then converting it to months:

df$Age1<- as.period(df$interval)
head(df$Age1)

[1] "1y 2m 26d 0H 0M 0S" "6m 15d 23H 0M 0S"  
[3] "4m 9d 23H 0M 0S"    "3m 19d 23H 0M 0S"  
[5] "3y 0m 25d 0H 0M 0S" "1y 1m 29d 1H 0M 0S"

df$Age1 <- df$Age1 %/% months(1)
head(df$Age1)

[1] 14  6  4  3 36 13
like image 35
mcjudd Avatar answered Oct 10 '22 01:10

mcjudd


Here is another example of this reported issue with lubridate (1.3.3). Note that there may be different error messages depending on what else is in the dataset, and the issue seems to be dependent on the unit of measure (in my case months worked whereas years did not).

dat <- as.data.frame(list(Start = as.Date(c("1942-08-09", "1956-02-29")),
                          End   = as.Date(c("2007-07-31", "2007-09-13"))))

int0 <- with(dat, new_interval(Start, End))
as.period(int0, unit = "years")
"Error in est[start + est * per > end] <- est[start + est * per > end] -  : 
  NAs are not allowed in subscripted assignments"

int1 <- with(dat[1,], new_interval(Start, End))
as.period(int1, unit = "years")
[1] "64y 11m 22d 0H 0M 0S"

int2 <- with(dat[2,], new_interval(Start, End))
as.period(int2, unit = "years")
"Error in while (any(start + est * per > end)) est[start + est * per >  : 
  missing value where TRUE/FALSE needed"

as.period(int0) %/% years(1)
[1] 64 51

as.period(int0, unit = "months")
[1] "779m 22d 0H 0M 0S" "618m 15d 0H 0M 0S"
like image 40
JWilliman Avatar answered Oct 10 '22 00:10

JWilliman