Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a more elegant way to convert two-digit years to four-digit years with lubridate?

Tags:

date

r

lubridate

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:

library(lubridate)     mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04")) 

gives the following output:

Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y. Using date format %m/%d/%y. [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC" 

I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?

Update: After @JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.

It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.

like image 224
eipi10 Avatar asked Sep 07 '12 18:09

eipi10


People also ask

How do you convert a two digit year to a four digit number in Excel?

The function =TEXT(A2,"dd/mm/yyyy") will display your two-digit years as four-digit years, but this approach adheres to the 1900 versus 2000 assumptions explained in the previous topic.

What does 2 digit year mean?

If a data item with a 4-digit year or century is moved to a 2-digit year, the first 2 digits of the year (the century) are truncated.

What is a four digit year?

Four Digit Year Format means the format which represents all four digits of the calendar year. The first two digits represent the century and the last two digits represent the year within the century (e.g., the century and year nineteen hundred and ninety-six is represented by "1996").


1 Answers

Here is a function that allows you to do this:

library(lubridate) x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))   foo <- function(x, year=1968){   m <- year(x) %% 100   year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)   x } 

Try it out:

x [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC"  foo(x) [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC"  foo(x, 1950) [1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC" 

The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.

like image 88
Andrie Avatar answered Oct 07 '22 11:10

Andrie