I have some dates, which I can extract the day of the month from:
trimws(format(seq.Date(
from = as.Date("2016-01-01"),
to = as.Date("2016-10-01"), by = "day"), "%e"))
I would like to format the dates with suffixes "th", "rd" or "nd" as appropriate. So, "1st", "2nd", "3rd", etc. Is there an easy way to accomplish this, or will I have to enumerate the rules?
I can implement this as a brute force lookup:
df_dates = data_frame(
day = seq.int(31),
suffix = c(
"st",
"nd",
"rd",
rep("th", 17),
"st",
"nd",
"rd",
rep("th", 7),
"st"
)
)
but a more elegant solution would be welcome.
Here is a tidyverse solution, using the vectorized SQL style if-else function case_when
.
library(dplyr)
library(lubridate)
append_date_suffix <- function(dates){
dayy <- day(dates)
suff <- case_when(dayy %in% c(11,12,13) ~ "th",
dayy %% 10 == 1 ~ 'st',
dayy %% 10 == 2 ~ 'nd',
dayy %% 10 == 3 ~'rd',
TRUE ~ "th")
paste0(dayy, suff)
}
Testing it using today's date
append_date_suffix(as.Date(-10:10, now()))
[1] "4th" "5th" "6th" "7th" "8th" "9th" "10th"
[8] "11th" "12th" "13th" "14th" "15th" "16th" "17th"
[15] "18th" "19th" "20th" "21st" "22nd" "23rd" "24th"
As requested, timings:
library(microbenchmark)
microbenchmark(scales::ordinal(as.Date(-1000:1000, now())),
append_date_suffix(as.Date(-1000:1000, now())))
Unit: milliseconds
expr min lq mean median uq max neval
scales::ordinal(as.Date(-1000:1000, now())) 45.89437 46.408347 47.316820 46.734974 48.228251 53.14592 100
append_date_suffix(as.Date(-1000:1000, now())) 1.39770 1.451481 1.549895 1.490646 1.530105 3.52757 100
The actual timings requested are below. We're not measuring the speed of as.Date()
and we need to ensure both methods output the same thing:
ads_cw <- function(dates){
dayy <- day(dates)
suff <- case_when(dayy %in% c(11,12,13) ~ "th",
dayy %% 10 == 1 ~ 'st',
dayy %% 10 == 2 ~ 'nd',
dayy %% 10 == 3 ~'rd',
TRUE ~ "th")
paste0(dayy, suff)
}
ads_so <- function(dates) {
dayy <- day(dates)
scales::ordinal(dayy)
}
dates <- as.Date(-1000:1000, now())
microbenchmark(ads_cw(dates), ads_so(dates))
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## ads_cw(dates) 1.226038 1.267377 1.526139 1.329442 1.505056 3.180228 100 a
## ads_so(dates) 7.270987 7.632697 8.275644 8.077106 8.816440 10.571275 100 b
The answer code is still faster than scales::ordinal
but the benchmark is now honest.
Of note, If you want to make a comparison using just numeric vectors, it is still ~ 7 times faster.
just_nums <- function(n){
suff <- case_when(n %in% c(11,12,13) ~ "th",
n %% 10 == 1 ~ 'st',
n %% 10 == 2 ~ 'nd',
n %% 10 == 3 ~'rd',
TRUE ~ "th")
paste0(n, suff)
}
microbenchmark(scales::ordinal(1:1000),
just_nums(1:1000))
Unit: microseconds
expr min lq mean median uq max neval
scales::ordinal(1:1000) 4411.144 4483.191 5055.2170 4560.647 4738.355 45206.038 100
just_nums(1:1000) 666.407 687.305 788.3066 713.319 746.347 1808.943 100
Here is a little help:
getOrdinalNumber <- function(num) {
result <- ""
if (!(num %% 100 %in% c(11, 12, 13))) {
result <- switch(as.character(num %% 10),
"1" = {paste0(num, "st")},
"2" = {paste0(num, "nd")},
"3" = {paste0(num, "rd")},
paste0(num,"th"))
} else {
result <- paste0(num, "th")
}
result
}
The function works the following way:
num %% 100
indicates x mod y, so you check the remainder after division of one number by another. So for example 21 %% 100
is 21. So 21 is NOT %in% c(11,12,13)
, but !
makes the statement TRUE
and the switch
argument adds a "st"
If we have num <- 11
, the first check 11 %% 100
is 11 and so a "th" is added (so we are in the else
loop)
That's just a starting point for you, because you can use this function to do this not only for single numbers, but for whole vectors. But that's your work to do :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With