Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using strptime %z with special timezone format

I am working with .csv data that was exported from Teradata. Several columns were originally timestamps with timezones, so after loading the .csv in R I'd like to convert these columns (which are loaded as strings) to POSIXlt or POSIXct. I am using strptime, but the format of the timezone from the .csv file does not match what strptime is expecting. For example, it expects -0400 but the .csv has the format -04:00 where a colon separates the hours and minutes.

I can remove the colon, but this is an extra step and complication I'd like to avoid if possible. Is there a way to tell strptime to use a different format for the timezone (%z)?

Here is an example:

## Example data:
x <- c("2011-10-12 22:17:13.860746-04:00", "2011-10-12 22:17:13.860746+00:00")
format <- "%Y-%m-%d %H:%M:%OS%z"

## Doesn't work:
strptime(x,format)
## [1] NA NA

## Ignores the timezone:
as.POSIXct(x)
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 22:17:13 EDT"

## Remove the last colon:
x2 <- gsub("(.*):", "\\1", x)
x2
## [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"

## This works, but requires extra processing (removing the colon)
strptime(x2,format)
## [1] "2011-10-12 22:17:13" "2011-10-12 18:17:13"

So I'm looking to achieve this last result using something like strptime(x,"%Y-%m-%d %H:%M:%OS%zz"), where %zz is a custom expression for the timezone that recognizes the -04:00 format. Or %zH:%zM might be even better.

If this isn't possible, does anyone have a slick/flexible function for converting strings (of various formats) to dates for multiple columns of a data.frame/data.table?

like image 826
dnlbrky Avatar asked Feb 22 '13 19:02

dnlbrky


2 Answers

It turns out lubridate can handle this format:

library(lubridate)
ymd_hms(x)
## [1] "2011-10-13 02:17:13 UTC" "2011-10-12 22:17:13 UTC"

Or, to display in the local timezone:

with_tz(ymd_hms(x))
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 18:17:13 EDT"

For more flexibility (still using lubridate):

parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z")

For faster speed (amongst lubridate options):

lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")

Timings:

microbenchmark(
  ymd_hms(x),
  parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z"),
  lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO"),
  strptime(gsub("(.*):", "\\1", x), format)
)

## Unit: microseconds
##                                               expr      min       lq       mean    median        uq      max neval
##                                         ymd_hms(x) 1523.819 1578.495 1715.14577 1629.5385 1744.3695 2850.393   100
##         parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z") 1108.676 1150.633 1273.77301 1190.3315 1264.8050 5947.204   100
##  lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")   89.838  103.390  112.45338  107.8425  115.2265  216.512   100
##        strptime(gsub("(.*):", "\\\\1", x), format)   46.716   58.294   71.90934   69.9415   86.5860  105.044   100
like image 74
dnlbrky Avatar answered Oct 16 '22 13:10

dnlbrky


I've just come across this question trying to achieve the same thing.

The only thing I've found to fix it is to use regex to remove the colon, as you have mentioned. You can tighten the regex a little to avoid making mistakes in the replacement.

x2 <- gsub('^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]+[+-][0-9]{2}):([0-9]{2})$',
           '\\1\\2',
           x)
# [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"
like image 29
mathematical.coffee Avatar answered Oct 16 '22 15:10

mathematical.coffee