As the title goes. Why is the lubridate function so much slower?
library(lubridate) library(microbenchmark) Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 50000, replace = TRUE) microbenchmark(as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT"), times = 100) microbenchmark(dmy(Dates, tz ="GMT"), times = 100) Unit: milliseconds expr min lq median uq max 1 as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT") 103.1902 104.3247 108.675 109.2632 149.871 2 dmy(Dates, tz = "GMT") 184.4871 194.1504 197.8422 214.3771 268.4911
For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you're less likely to get blown up and it's easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we'd have something.
Start with looking at what dmy
is doing and you'll see the difference for the speed (by the way from your bechmarks I wouldn't say that lubridate
is that much slower as these are in milliseconds):
dmy
#type this into the command line and you get:
>dmy function (..., quiet = FALSE, tz = "UTC") { dates <- unlist(list(...)) parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet, tz = tz) } <environment: namespace:lubridate>
Right away I see parse_date
and num_to_date
and make_format
. Makes one wonder what all these guys are. Let's see:
parse_date
> parse_date function (x, formats, quiet = FALSE, seps = find_separator(x), tz = "UTC") { fmt <- guess_format(head(x, 100), formats, seps, quiet) parsed <- as.POSIXct(strptime(x, fmt, tz = tz)) if (length(x) > 2 & !quiet) message("Using date format ", fmt, ".") failed <- sum(is.na(parsed)) - sum(is.na(x)) if (failed > 0) { message(failed, " failed to parse.") } parsed } <environment: namespace:lubridate>
num_to_date
> getAnywhere(num_to_date) A single object matching ‘num_to_date’ was found It was found in the following places namespace:lubridate with value function (x) { if (is.numeric(x)) { x <- as.character(x) x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "") } x } <environment: namespace:lubridate>
make_format
> getAnywhere(make_format) A single object matching ‘make_format’ was found It was found in the following places namespace:lubridate with value function (order) { order <- strsplit(order, "")[[1]] formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y", "%Y"))[order] grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE) lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ]))) } <environment: namespace:lubridate>
Wow we got strsplit-ting
, expand-ing.grid-s
, paste-ing
, ifelse-ing
, unname-ing
etc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.
Compare that to as.POSIXct
:
getAnywhere(as.POSIXct) #tells us to use methods to see the business methods('as.POSIXct') #tells us all the business as.POSIXct.date #what I believe your code is using (I don't use dates though)
There's a lot more Internal coding and less error checking going on with as.POSIXct
So you have to ask do I want ease and safety or speed and power? Depends on the job.
@Tyler's answer is correct. Here's some more info including a tip on making lubridate faster - from the help file:
" Lubridate has an inbuilt very fast POSIX parser, ported from the fasttime package by Simon Urbanek. This functionality is as yet optional and could be activated with options(lubridate.fasttime = TRUE). Lubridate will automatically detect POSIX strings and use fast parser instead of the default strptime utility. "
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With