For output, the specification is %Z
(see ?strptime
). But for input, how does that work?
To clarify, it'd be great for the time zone abbreviation to be parsed into useful information by as.POSIXct()
, but more core to be question is how to get the function to at least ignore the time zone.
Here is my best workaround, but is there a particular format code to pass to as.POSIXct()
that will work for all time zones?
times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input
formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\\1", times),"%Y")
as.POSIXct(times, format=formats) # works
Edit: Here is the output from the last line, as well as its class (from a separate call); the output is as expected. From the console:
> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"
> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
The short answer is, "no, you can't." Those are abbreviations and they are not guaranteed to uniquely identify a specific timezone.
For example, is "EST" Eastern Standard Time in the US or Australia? Is "CST" Central Standard Time in the US or Australia, or is it China Standard Time, or is it Cuba Standard Time?
I just noticed that you're not trying to parse the timezone abbreviation, you are simply trying to avoid it. I don't know of a way to tell strptime
to ignore arbitrary characters. I do know that it will ignore anything in the character representation of the time after the end of the format string. For example:
R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
Other than that, a regular expression is the only thing I can think of that solves this problem. Unlike your example, I would use the regex on the input character vector to remove all 3-5 character timezone abbreviations.
R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With