Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading timestamp data in R from multiple time zones

I have a column of time stamps in character format that looks like this:

2015-09-24 06:00:00 UTC

2015-09-24 05:00:00 UTC

dateTimeZone <- c("2015-09-24 06:00:00 UTC","2015-09-24 05:00:00 UTC")

I'd like to convert this character data into time data using POSIXct, and if I knew that all the time stamps were in UTC, I would do it like this:

dateTimeZone <- asPOSIXct(dateTimeZone, tz="UTC")

However, I don't necessarily know that all the time stamps are in UTC, so I tried

dateTimeZone <- asPOSIXct(dateTimeZodateTimeZone, format = "%Y-%m-%d %H:%M:%S %Z")

However, because strptime supports %Z only for output, this returns the following error:

Error in strptime(x, format, tz = tz) : use of %Z for input is not supported

I checked the documentation for the lubridate package, and I couldn't see that it handled this issue any differently than POSIXct.

Is my only option to check the time zone of each row and then use the appropriate time zone with something like the following?

temp[grepl("UTC",datetimezone)] <- as.POSIXct(datetimezone, tz="UTC")
temp[grepl("PDT",datetimezone)] <- as.POSIXct(datetimezone, tz="America/Los_Angeles")
like image 754
Derek Avatar asked Sep 30 '15 23:09

Derek


2 Answers

You can get there by checking each row and processing accordingly, and then putting everything back into a consistent UTC time. (#edited to now include matching the timezone abbreviations to the full timezone specification)

dates <- c(
  "2015-09-24 06:00:00 UTC",
  "2015-09-24 05:00:00 PDT"
)

#extract timezone from dates
datestz <- vapply(strsplit(dates," "), tail, 1, FUN.VALUE="")

## Make a master list of abbreviation to 
## full timezone names. Used an arbitrary summer
## and winter date to try to catch daylight savings timezones.

tzabbrev <- vapply(
  OlsonNames(),
  function(x) c(
    format(as.POSIXct("2000-01-01",tz=x),"%Z"),
    format(as.POSIXct("2000-07-01",tz=x),"%Z")
  ),
  FUN.VALUE=character(2)
)
tmp <- data.frame(Olson=OlsonNames(), t(tzabbrev), stringsAsFactors=FALSE)
final <- unique(data.frame(tmp[1], abbrev=unlist(tmp[-1])))

## Do the matching:
out <- Map(as.POSIXct, dates, tz=final$Olson[match(datestz,final$abbrev)])
as.POSIXct(unlist(out), origin="1970-01-01", tz="UTC")
#  2015-09-24 06:00:00 UTC   2015-09-24 05:00:00 PDT 
#"2015-09-24 06:00:00 GMT" "2015-09-24 12:00:00 GMT" 
like image 105
thelatemail Avatar answered Nov 15 '22 10:11

thelatemail


A data.table solution:

library(data.table)

data <- data.table(dateTimeZone=c("2015-09-24 06:00:00 UTC",
                                  "2015-09-24 05:00:00 America/Los_Angeles"))
data[, timezone:=tstrsplit(dateTimeZone, split=" ")[[3]]]
data[, datetime.local:=as.POSIXct(dateTimeZone, tz=timezone), by=timezone]
data[, datetime.utc:=format(datetime.local, tz="UTC")]

The key thing is to split the data on the timezone field so that you can feed each set of timezones to as.POSIXct separately (I'm not really sure why as.POSIXct won't let you give it a vector of timezones, actually). Here I make use of data.table's efficient split-apply-combine syntax, but you could apply the same general idea with base R or using dplyr.

like image 34
pbaylis Avatar answered Nov 15 '22 09:11

pbaylis