Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Add "th", "rd" and "nd" to dates

Tags:

r

I have some dates, which I can extract the day of the month from:

trimws(format(seq.Date(
  from = as.Date("2016-01-01"),
  to = as.Date("2016-10-01"), by = "day"), "%e"))

I would like to format the dates with suffixes "th", "rd" or "nd" as appropriate. So, "1st", "2nd", "3rd", etc. Is there an easy way to accomplish this, or will I have to enumerate the rules?


I can implement this as a brute force lookup:

df_dates = data_frame(
  day = seq.int(31),
  suffix = c(
    "st",
    "nd",
    "rd",
    rep("th", 17),
    "st",
    "nd",
    "rd",
    rep("th", 7),
    "st"
  )
)

but a more elegant solution would be welcome.

like image 567
tchakravarty Avatar asked Oct 14 '16 09:10

tchakravarty


2 Answers

Here is a tidyverse solution, using the vectorized SQL style if-else function case_when.

library(dplyr)
library(lubridate)

append_date_suffix <- function(dates){
  dayy <- day(dates)
  suff <- case_when(dayy %in% c(11,12,13) ~ "th",
                    dayy %% 10 == 1 ~ 'st',
                    dayy %% 10 == 2 ~ 'nd',
                    dayy %% 10 == 3 ~'rd',
                    TRUE ~ "th")
  paste0(dayy, suff)
}

Testing it using today's date

append_date_suffix(as.Date(-10:10, now()))

 [1] "4th"  "5th"  "6th"  "7th"  "8th"  "9th"  "10th" 
 [8] "11th" "12th" "13th" "14th" "15th" "16th" "17th"
[15] "18th" "19th" "20th" "21st" "22nd" "23rd" "24th"

As requested, timings:

library(microbenchmark)
microbenchmark(scales::ordinal(as.Date(-1000:1000, now())),
               append_date_suffix(as.Date(-1000:1000, now())))

Unit: milliseconds
                                           expr      min        lq      mean    median        uq      max neval
    scales::ordinal(as.Date(-1000:1000, now())) 45.89437 46.408347 47.316820 46.734974 48.228251 53.14592   100
 append_date_suffix(as.Date(-1000:1000, now()))  1.39770  1.451481  1.549895  1.490646  1.530105  3.52757   100

The actual timings requested are below. We're not measuring the speed of as.Date() and we need to ensure both methods output the same thing:

ads_cw <- function(dates){
  dayy <- day(dates)
  suff <- case_when(dayy %in% c(11,12,13) ~ "th",
                    dayy %% 10 == 1 ~ 'st',
                    dayy %% 10 == 2 ~ 'nd',
                    dayy %% 10 == 3 ~'rd',
                    TRUE ~ "th")
  paste0(dayy, suff)
}

ads_so <- function(dates) {
  dayy <- day(dates)
  scales::ordinal(dayy)
}

dates <- as.Date(-1000:1000, now())
microbenchmark(ads_cw(dates), ads_so(dates))
## Unit: milliseconds
##           expr      min       lq     mean   median       uq       max neval cld
##  ads_cw(dates) 1.226038 1.267377 1.526139 1.329442 1.505056  3.180228   100  a 
##  ads_so(dates) 7.270987 7.632697 8.275644 8.077106 8.816440 10.571275   100   b

The answer code is still faster than scales::ordinal but the benchmark is now honest.

Of note, If you want to make a comparison using just numeric vectors, it is still ~ 7 times faster.

just_nums <- function(n){

  suff <- case_when(n %in% c(11,12,13) ~ "th",
                    n %% 10 == 1 ~ 'st',
                    n %% 10 == 2 ~ 'nd',
                    n %% 10 == 3 ~'rd',
                    TRUE ~ "th")
  paste0(n, suff)
}

microbenchmark(scales::ordinal(1:1000),
               just_nums(1:1000))

Unit: microseconds
                    expr      min       lq      mean   median       uq       max neval
 scales::ordinal(1:1000) 4411.144 4483.191 5055.2170 4560.647 4738.355 45206.038   100
       just_nums(1:1000)  666.407  687.305  788.3066  713.319  746.347  1808.943   100
like image 92
shayaa Avatar answered Sep 24 '22 12:09

shayaa


Here is a little help:

getOrdinalNumber <- function(num) {
   result <- ""
  if (!(num %% 100 %in% c(11, 12, 13))) {
    result <- switch(as.character(num %% 10), 
                     "1" = {paste0(num, "st")}, 
                     "2" = {paste0(num, "nd")},
                     "3" = {paste0(num, "rd")},
                     paste0(num,"th"))
      } else {
        result <- paste0(num, "th")
      }
    result
}

The function works the following way:

num %% 100 indicates x mod y, so you check the remainder after division of one number by another. So for example 21 %% 100 is 21. So 21 is NOT %in% c(11,12,13), but ! makes the statement TRUE and the switch argument adds a "st"

If we have num <- 11, the first check 11 %% 100 is 11 and so a "th" is added (so we are in the else loop)

That's just a starting point for you, because you can use this function to do this not only for single numbers, but for whole vectors. But that's your work to do :-)

like image 38
J_F Avatar answered Sep 25 '22 12:09

J_F