Am I missing something? I can't figure out how to convert the following to <code>Date</code>s, where day of the month (<code>%d</code>) has the ordinal suffixes <code>-st</code>, <code>-nd</code>, <code>-rd</code>, <code>-th</code>: <pre class="prettyprint"><code>ord_dates <- c("September 1st, 2016", "September 2nd, 2016", "September 3rd, 2016", "September 4th, 2016") </code></pre> <code>?strptime</code> doesn't appear to list a shorthand for the ordinal suffix, and it isn't handled automagically: <pre class="prettyprint"><code>as.Date(ord_dates, format = c("%B %d, %Y")) #[1] NA NA NA NA </code></pre> Is there a token for handling ignored characters in the <code>format</code> argument? A token I'm missing? Best I can come up with is (there may a shorter regex, but same idea): <pre class="prettyprint"><code>as.Date(gsub("([0-9]+)(st|nd|rd|th)", "\\1", ord_dates), format = "%B %d, %Y") # [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04" </code></pre> Seems like this sort of data should be relatively common; am I missing something?

Enjoy the power of <code>lubridate</code>: <pre class="prettyprint"><code>library(lubridate) mdy(ord_dates) [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04" </code></pre> Internally, <code>lubridate</code> doesn't have any special conversion specifications which enable this. Rather, <code>lubridate</code> first uses (by smart guessing) the format <code>"%B %dst, %Y"</code>. This gets the first element of <code>ord_dates</code>. It then checks for <code>NA</code>s and repeats its smart guessing on the remaining elements, settling on <code>"%B %dnd, %Y"</code> to get the second element. It continues in this way until there are no <code>NA</code>s left (which happens in this case after 4 iterations), or until its smart guessing fails to turn up a likely format candidate. You can imagine this makes <code>lubridate</code> slower, and it does -- about half the speed of just using the smart regex suggested by @alistaire above: <pre class="prettyprint"><code>set.seed(109123) ord_dates <- sample( c("September 1st, 2016", "September 2nd, 2016", "September 3rd, 2016", "September 4th, 2016"), 1e6, TRUE ) library(microbenchmark) microbenchmark(times = 10L, lubridate = mdy(ord_dates), base = as.Date(sub("\\D+,", "", ord_dates), format = "%B %e %Y")) # Unit: seconds # expr min lq mean median uq max neval cld # lubridate 2.167957 2.219463 2.290950 2.252565 2.301725 2.587724 10 b # base 1.183970 1.224824 1.218642 1.227034 1.228324 1.229095 10 a </code></pre> The obvious advantage in <code>lubridate</code>'s favor being its conciseness and flexibility.

Format for ordinal dates (day of month with suffixes -st, -nd, -rd, -th)

Tags:

date

r

Am I missing something? I can't figure out how to convert the following to Dates, where day of the month (%d) has the ordinal suffixes -st, -nd, -rd, -th:

ord_dates <- c("September 1st, 2016", "September 2nd, 2016",
               "September 3rd, 2016", "September 4th, 2016")

?strptime doesn't appear to list a shorthand for the ordinal suffix, and it isn't handled automagically:

as.Date(ord_dates, format = c("%B %d, %Y"))
#[1] NA NA NA NA

Is there a token for handling ignored characters in the format argument? A token I'm missing?

Best I can come up with is (there may a shorter regex, but same idea):

as.Date(gsub("([0-9]+)(st|nd|rd|th)", "\\1", ord_dates), format = "%B %d, %Y")
# [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

Seems like this sort of data should be relatively common; am I missing something?

588

asked Aug 30 '16 21:08

MichaelChirico

1 Answers

Enjoy the power of lubridate:

library(lubridate)    
mdy(ord_dates)

[1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

Internally, lubridate doesn't have any special conversion specifications which enable this. Rather, lubridate first uses (by smart guessing) the format "%B %dst, %Y". This gets the first element of ord_dates.

It then checks for NAs and repeats its smart guessing on the remaining elements, settling on "%B %dnd, %Y" to get the second element. It continues in this way until there are no NAs left (which happens in this case after 4 iterations), or until its smart guessing fails to turn up a likely format candidate.

You can imagine this makes lubridate slower, and it does -- about half the speed of just using the smart regex suggested by @alistaire above:

set.seed(109123)
ord_dates <- sample(
  c("September 1st, 2016", "September 2nd, 2016",
    "September 3rd, 2016", "September 4th, 2016"),
  1e6, TRUE
  )

library(microbenchmark)

microbenchmark(times = 10L,
               lubridate = mdy(ord_dates),
               base = as.Date(sub("\\D+,", "", ord_dates),
                              format = "%B %e %Y"))
# Unit: seconds
#       expr      min       lq     mean   median       uq      max neval cld
#  lubridate 2.167957 2.219463 2.290950 2.252565 2.301725 2.587724    10   b
#       base 1.183970 1.224824 1.218642 1.227034 1.228324 1.229095    10  a

The obvious advantage in lubridate's favor being its conciseness and flexibility.

165

answered Nov 16 '22 04:11

thepule

Related questions
                            
                                fread in R imports a large .csv file as a data frame with one row
                            
                                R Data table copy and modification alters original one [duplicate]
                            
                                Retrieving the optimal number of clusters in R
                            
                                R programming - How to create a 2 dimensional array of vectors which are of different lengths
                            
                                aperm function not clear
                            
                                RSelenium and findElements with inspect element use
                            
                                How to add logo to each slide in ioslides in RStudio
                            
                                Using := in data.table with paste()
                            
                                Summarise over all columns
                            
                                R - Converting Fractions in Text to Numeric
                            
                                Retrieve choice name rather than value
                            
                                cbind vs rbind with data.table
                            
                                Converting UTC time to local standard time in R
                            
                                Plot angle between vectors
                            
                                ggplot2: remove blank space for weekends and holidays from x-axis dates
                            
                                Merging Table Header Cells Using tableGrob
                            
                                ggplot2: forcing space for empty second-level category
                            
                                Is there a way to obtain coefficients for each step of the optimization algorithm in glm function?
                            
                                No ggplot2 graphs are working: "Error in y[setdiff(names(y), names(x))] : object of type 'closure' is not subsettable"
                            
                                Manually adding legend values in leaflet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With