I have followed a number of questions here that asks about how to convert character vectors to datetime classes. I often see 2 methods, the strptime and the as.POSIXct/as.POSIXlt methods. I looked at the 2 functions but am unclear what the difference is.
function (x, format, tz = "") { y <- .Internal(strptime(as.character(x), format, tz)) names(y$year) <- names(x) y } <bytecode: 0x045fcea8> <environment: namespace:base>
function (x, tz = "", ...) UseMethod("as.POSIXct") <bytecode: 0x069efeb8> <environment: namespace:base>
function (x, tz = "", ...) UseMethod("as.POSIXlt") <bytecode: 0x03ac029c> <environment: namespace:base>
Doing a microbenchmark to see if there are performance differences:
library(microbenchmark) Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 5000, replace = TRUE) df <- microbenchmark(strptime(Dates, "%d-%m-%Y"), as.POSIXlt(Dates, format = "%d-%m-%Y"), times = 1000) Unit: milliseconds expr min lq median uq max 1 as.POSIXlt(Dates, format = "%d-%m-%Y") 32.38596 33.81324 34.78487 35.52183 61.80171 2 strptime(Dates, "%d-%m-%Y") 31.73224 33.22964 34.20407 34.88167 52.12422
strptime seems slightly faster. so what gives? why would there be 2 similar functions or are there differences between them that I missed?
The builtin as. Date function handles dates (without times); the contributed library chron handles dates and times, but does not control for time zones; and the POSIXct and POSIXlt classes allow for dates and times with control for time zones.
as. POSIXct stores both a date and time with an associated time zone. The default time zone selected, is the time zone that your computer is set to which is most often your local time zone. POSIXct stores date and time in seconds with the number of seconds beginning at 1 January 1970.
The basic POSIX measure of time, calendar time, is the number of seconds since the beginning of 1970, in the UTC timezone (GMT as described by the French).
Well, the functions do different things.
First, there are two internal implementations of date/time: POSIXct
, which stores seconds since UNIX epoch (+some other data), and POSIXlt
, which stores a list of day, month, year, hour, minute, second, etc.
strptime
is a function to directly convert character vectors (of a variety of formats) to POSIXlt
format.
as.POSIXlt
converts a variety of data types to POSIXlt
. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime
.
as.POSIXct
converts a variety of data types to POSIXct
. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime
first, then does the conversion from POSIXlt
to POSIXct
.
It makes sense that strptime
is faster, because strptime
only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With