Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I specify POSIX (time) format for 3 letter tz in R, in order to ignore it?

For output, the specification is %Z (see ?strptime). But for input, how does that work?

To clarify, it'd be great for the time zone abbreviation to be parsed into useful information by as.POSIXct(), but more core to be question is how to get the function to at least ignore the time zone.

Here is my best workaround, but is there a particular format code to pass to as.POSIXct() that will work for all time zones?

times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input

formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\\1", times),"%Y")
as.POSIXct(times, format=formats) # works

Edit: Here is the output from the last line, as well as its class (from a separate call); the output is as expected. From the console:

> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"

> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] ""
like image 219
rbatt Avatar asked Sep 27 '22 03:09

rbatt


1 Answers

The short answer is, "no, you can't." Those are abbreviations and they are not guaranteed to uniquely identify a specific timezone.

For example, is "EST" Eastern Standard Time in the US or Australia? Is "CST" Central Standard Time in the US or Australia, or is it China Standard Time, or is it Cuba Standard Time?


I just noticed that you're not trying to parse the timezone abbreviation, you are simply trying to avoid it. I don't know of a way to tell strptime to ignore arbitrary characters. I do know that it will ignore anything in the character representation of the time after the end of the format string. For example:

R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"

Other than that, a regular expression is the only thing I can think of that solves this problem. Unlike your example, I would use the regex on the input character vector to remove all 3-5 character timezone abbreviations.

R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
like image 103
Joshua Ulrich Avatar answered Oct 18 '22 07:10

Joshua Ulrich