Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating Months between Factored Time Variables

Tags:

r

I have a factored time series that looks like this:

df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006", 
                   "11-JUL-2007", "11-JUL-2008"),
                 b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001", 
                     "11-JUN-2002", "11-JUN-2003"))

First, I would like to convert this to a format native to R. Second, I would like to calculate the number of months between the two columns.

Update:

Essentially I'm trying to recreate what I do in SPSS, in R.

In SPSS I would:

  1. Convert the strings to date format DD-MMM-YYYY
  2. COMPUTE. RND((a-b)/60/60/24/30.416)

30.416 is short for 365/12 I don't care so much about month edge cases, hence the rounding operation.

like image 992
Brandon Bertelsen Avatar asked Dec 04 '25 15:12

Brandon Bertelsen


2 Answers

df <- data.frame(c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
                 c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
names(df) <- c("X1","X2")
df <- within(df, X1 <- as.Date(X1, format = "%d-%b-%Y"))
df <- within(df, X2 <- as.Date(X2, format = "%d-%b-%Y"))

Then difftime() will give the difference in weeks:

> with(df, difftime(X1, X2, units = "weeks"))
Time differences in weeks
[1] 265.2857 265.1429 265.1429 265.1429 265.2857

Or if we use Brandon's approximation:

> with(df, difftime(X1, X2) / 30.416)
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339

Closest I could get with lubridate (as highlighted by Dirk) is (using the above df)

> m <- with(df, as.period(subtract_dates(X1, X2)))
> m
[1] 5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month
> str(m)
Classes ‘period’ and 'data.frame':  5 obs. of  6 variables:
 $ year  : int  5 5 5 5 5
 $ month : int  1 1 1 1 1
 $ day   : num  0 0 0 0 0
 $ hour  : int  0 0 0 0 0
 $ minute: int  0 0 0 0 0
 $ second: num  0 0 0 0 0
like image 105
Gavin Simpson Avatar answered Dec 07 '25 05:12

Gavin Simpson


Josh is spot-on with respect to the difficulty of what a month could mean. The lubridate package has some answers on that.

In terms of base R, we can answer it for weeks though:

> df[,"pa"] <- as.POSIXct(strptime(as.character(df$a),
+                         format="%d-%B-%Y", tz="GMT"))
> df[,"pb"] <- as.POSIXct(strptime(as.character(df$b),
+                         format="%d-%B-%Y",tz="GMT"))
> df[,"weeks"] <- difftime(df$pa, df$pb, unit="weeks")
> df[,"months"] <- difftime(df$pa, df$pb, unit="days")/30.416
> df
            a           b         pa         pb        weeks      months
1 11-JUL-2004 11-JUN-1999 2004-07-11 1999-06-11 265.29 weeks 61.053 days
2 11-JUL-2005 11-JUN-2000 2005-07-11 2000-06-11 265.14 weeks 61.021 days
3 11-JUL-2006 11-JUN-2001 2006-07-11 2001-06-11 265.14 weeks 61.021 days
4 11-JUL-2007 11-JUN-2002 2007-07-11 2002-06-11 265.14 weeks 61.021 days
5 11-JUL-2008 11-JUN-2003 2008-07-11 2003-06-11 265.29 weeks 61.053 days
> 

This uses the altered data.frame as per my edit so that we have proper column names. And if you throw an as.numeric() around difftime() you also get numbers.

like image 35
Dirk Eddelbuettel Avatar answered Dec 07 '25 04:12

Dirk Eddelbuettel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!