I have a column of time stamps in character format that looks like this:
2015-09-24 06:00:00 UTC
2015-09-24 05:00:00 UTC
dateTimeZone <- c("2015-09-24 06:00:00 UTC","2015-09-24 05:00:00 UTC")
I'd like to convert this character data into time data using POSIXct, and if I knew that all the time stamps were in UTC, I would do it like this:
dateTimeZone <- asPOSIXct(dateTimeZone, tz="UTC")
However, I don't necessarily know that all the time stamps are in UTC, so I tried
dateTimeZone <- asPOSIXct(dateTimeZodateTimeZone, format = "%Y-%m-%d %H:%M:%S %Z")
However, because strptime supports %Z only for output, this returns the following error:
Error in strptime(x, format, tz = tz) : use of %Z for input is not supported
I checked the documentation for the lubridate package, and I couldn't see that it handled this issue any differently than POSIXct.
Is my only option to check the time zone of each row and then use the appropriate time zone with something like the following?
temp[grepl("UTC",datetimezone)] <- as.POSIXct(datetimezone, tz="UTC")
temp[grepl("PDT",datetimezone)] <- as.POSIXct(datetimezone, tz="America/Los_Angeles")
You can get there by checking each row and processing accordingly, and then putting everything back into a consistent UTC time. (#edited to now include matching the timezone abbreviations to the full timezone specification)
dates <- c(
"2015-09-24 06:00:00 UTC",
"2015-09-24 05:00:00 PDT"
)
#extract timezone from dates
datestz <- vapply(strsplit(dates," "), tail, 1, FUN.VALUE="")
## Make a master list of abbreviation to
## full timezone names. Used an arbitrary summer
## and winter date to try to catch daylight savings timezones.
tzabbrev <- vapply(
OlsonNames(),
function(x) c(
format(as.POSIXct("2000-01-01",tz=x),"%Z"),
format(as.POSIXct("2000-07-01",tz=x),"%Z")
),
FUN.VALUE=character(2)
)
tmp <- data.frame(Olson=OlsonNames(), t(tzabbrev), stringsAsFactors=FALSE)
final <- unique(data.frame(tmp[1], abbrev=unlist(tmp[-1])))
## Do the matching:
out <- Map(as.POSIXct, dates, tz=final$Olson[match(datestz,final$abbrev)])
as.POSIXct(unlist(out), origin="1970-01-01", tz="UTC")
# 2015-09-24 06:00:00 UTC 2015-09-24 05:00:00 PDT
#"2015-09-24 06:00:00 GMT" "2015-09-24 12:00:00 GMT"
A data.table solution:
library(data.table)
data <- data.table(dateTimeZone=c("2015-09-24 06:00:00 UTC",
"2015-09-24 05:00:00 America/Los_Angeles"))
data[, timezone:=tstrsplit(dateTimeZone, split=" ")[[3]]]
data[, datetime.local:=as.POSIXct(dateTimeZone, tz=timezone), by=timezone]
data[, datetime.utc:=format(datetime.local, tz="UTC")]
The key thing is to split the data on the timezone field so that you can feed each set of timezones to as.POSIXct
separately (I'm not really sure why as.POSIXct
won't let you give it a vector of timezones, actually). Here I make use of data.table
's efficient split-apply-combine syntax, but you could apply the same general idea with base R or using dplyr
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With