If a date vector has two-digit years, mdy()
turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:
library(lubridate) mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))
gives the following output:
Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y. Using date format %m/%d/%y. [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"
I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?
Update: After @JoshuaUlrich pointed me to strptime
I found this question, which deals with an issue similar to mine, but using base R.
It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.
The function =TEXT(A2,"dd/mm/yyyy") will display your two-digit years as four-digit years, but this approach adheres to the 1900 versus 2000 assumptions explained in the previous topic.
If a data item with a 4-digit year or century is moved to a 2-digit year, the first 2 digits of the year (the century) are truncated.
Four Digit Year Format means the format which represents all four digits of the calendar year. The first two digits represent the century and the last two digits represent the year within the century (e.g., the century and year nineteen hundred and ninety-six is represented by "1996").
Here is a function that allows you to do this:
library(lubridate) x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04")) foo <- function(x, year=1968){ m <- year(x) %% 100 year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m) x }
Try it out:
x [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC" foo(x) [1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC" foo(x, 1950) [1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" [5] "2004-01-02 UTC"
The bit of magic here is to use the modulus operator %%
to return the fraction part of a division. So 1968 %% 100
yields 68.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With