I have a factored time series that looks like this:
df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006",
"11-JUL-2007", "11-JUL-2008"),
b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001",
"11-JUN-2002", "11-JUN-2003"))
First, I would like to convert this to a format native to R. Second, I would like to calculate the number of months between the two columns.
Essentially I'm trying to recreate what I do in SPSS, in R.
In SPSS I would:
30.416 is short for 365/12 I don't care so much about month edge cases, hence the rounding operation.
df <- data.frame(c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
names(df) <- c("X1","X2")
df <- within(df, X1 <- as.Date(X1, format = "%d-%b-%Y"))
df <- within(df, X2 <- as.Date(X2, format = "%d-%b-%Y"))
Then difftime() will give the difference in weeks:
> with(df, difftime(X1, X2, units = "weeks"))
Time differences in weeks
[1] 265.2857 265.1429 265.1429 265.1429 265.2857
Or if we use Brandon's approximation:
> with(df, difftime(X1, X2) / 30.416)
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339
Closest I could get with lubridate (as highlighted by Dirk) is (using the above df)
> m <- with(df, as.period(subtract_dates(X1, X2)))
> m
[1] 5 years and 1 month 5 years and 1 month 5 years and 1 month 5 years and 1 month 5 years and 1 month
> str(m)
Classes ‘period’ and 'data.frame': 5 obs. of 6 variables:
$ year : int 5 5 5 5 5
$ month : int 1 1 1 1 1
$ day : num 0 0 0 0 0
$ hour : int 0 0 0 0 0
$ minute: int 0 0 0 0 0
$ second: num 0 0 0 0 0
Josh is spot-on with respect to the difficulty of what a month could mean. The lubridate package has some answers on that.
In terms of base R, we can answer it for weeks though:
> df[,"pa"] <- as.POSIXct(strptime(as.character(df$a),
+ format="%d-%B-%Y", tz="GMT"))
> df[,"pb"] <- as.POSIXct(strptime(as.character(df$b),
+ format="%d-%B-%Y",tz="GMT"))
> df[,"weeks"] <- difftime(df$pa, df$pb, unit="weeks")
> df[,"months"] <- difftime(df$pa, df$pb, unit="days")/30.416
> df
a b pa pb weeks months
1 11-JUL-2004 11-JUN-1999 2004-07-11 1999-06-11 265.29 weeks 61.053 days
2 11-JUL-2005 11-JUN-2000 2005-07-11 2000-06-11 265.14 weeks 61.021 days
3 11-JUL-2006 11-JUN-2001 2006-07-11 2001-06-11 265.14 weeks 61.021 days
4 11-JUL-2007 11-JUN-2002 2007-07-11 2002-06-11 265.14 weeks 61.021 days
5 11-JUL-2008 11-JUN-2003 2008-07-11 2003-06-11 265.29 weeks 61.053 days
>
This uses the altered data.frame as per my edit so that we have proper column names. And if you throw an as.numeric() around difftime() you also get numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With