Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing ISO8601 date and time format in R [duplicate]

Tags:

datetime

r

This should be quick - we are parsing the following format in R:

2013-04-05T07:49:54-07:00

My current approach is

require(stringr) 
timenoT <- str_replace_all("2013-04-05T07:49:54-07:00", "T", " ") 
timep <- strptime(timenoT, "%Y-%m-%d %H:%M:%S%z", tz="UTC")

but it gives NA.

like image 608
Rico Avatar asked Apr 05 '13 16:04

Rico


People also ask

How do I read the ISO 8601 date format?

ISO 8601 Formats ISO 8601 represents date and time by starting with the year, followed by the month, the day, the hour, the minutes, seconds and milliseconds. For example, 2020-07-10 15:00:00.000, represents the 10th of July 2020 at 3 p.m. (in local time as there is no time zone offset specified—more on that below).

Is ISO 8601 always UTC?

Date.prototype.toISOString() The toISOString() method returns a string in simplified extended ISO format (ISO 8601), which is always 24 or 27 characters long ( YYYY-MM-DDTHH:mm:ss.sssZ or ±YYYYYY-MM-DDTHH:mm:ss.sssZ , respectively). The timezone is always zero UTC offset, as denoted by the suffix Z .

Does ISO 8601 include timezone?

Time zone designators. Time zones in ISO 8601 are represented as local time (with the location unspecified), as UTC, or as an offset from UTC.

What is the timezone of ISO 8601?

Universal Coordinate Time is the time at the zero meridian, near Greenwich, England. UTC is a datetime value that uses the ISO 8601 basic form yyyymmddT hhmmss+|– hhmm or the ISO 8601 extended form yyyy-mm-ddT hh:mm:ss+|– hh:mm.


2 Answers

%z is the signed offset in hours, in the format hhmm, not hh:mm. Here's one way to remove the last :.

newstring <- gsub("(.*).(..)$","\\1\\2","2013-04-05T07:49:54-07:00")
(timep <- strptime(newstring, "%Y-%m-%dT%H:%M:%S%z", tz="UTC"))
# [1] "2013-04-05 14:49:54 UTC"

Also note that you don't have to remove the "T".

like image 98
Joshua Ulrich Avatar answered Sep 20 '22 15:09

Joshua Ulrich


You don't the string replacement.

NA just means that the whole did not work, so do it pieces to build your expression:

R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%d") 
[1] "2013-04-05"
R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%dT%H:%M") 
[1] "2013-04-05 07:49:00"
R> strptime("2013-04-05T07:49:54-07:00", "%Y-%m-%dT%H:%M:%S")
[1] "2013-04-05 07:49:54" 
R>

Also, for reasons I never fully understood -- but which probably reside with C library function underlying it, %z only works on output, not input. So your NA mostly likely comes from your use of %z.

like image 41
Dirk Eddelbuettel Avatar answered Sep 18 '22 15:09

Dirk Eddelbuettel