Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a reliable way to detect POSIXlt objects representing a time which does not exist due to DST?

Tags:

datetime

r

dst

I have the following problem: the date column in data I get contains dates that do not exist due to daylight saving time. (For example 2015-03-29 02:00 does not exist in Central European Time, because the clock gets set directly from 01:59 to 03:00 because DST takes effect on this day)

Is there an easy and reliable way to determine if a date is valid with respect to daylight saving time?

This is not trivial because of the properties of the datetime classes.

# generating the invalid time as POSIXlt object
test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET")

# the object seems to represent something at least partially reasonable, notice the missing timezone specification though
test
# [1] "2015-03-29 02:00:00"

# strangely enough this object is regarded as NA by is.na
is.na(test)
# [1] TRUE

# which is no surprise if you consider:
is.na.POSIXlt
# function (x) 
# is.na(as.POSIXct(x))

as.POSIXct(test)
# [1] NA

# inspecting the interior of my POSIXlt object:
unlist(test)
# sec    min   hour   mday    mon   year   wday   yday  isdst   zone gmtoff
# "0"    "0"    "2"   "29"    "2"  "115"    "0"   "87"   "-1"     ""     NA

So the simplest way I thought of is to check the isdst field of the POSIXlt object, the help for POSIXt describes the filed as follows:

isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.

Is checking the isdst field save in the sense that this field is only -1 if the date is invalid due to dst-changes or can it be -1 for some other reasons?

Info on version, platform and locale

R.version
# _                           
# platform       x86_64-w64-mingw32          
# arch           x86_64                      
# os             mingw32                     
# system         x86_64, mingw32             
# status                                     
# major          3                           
# minor          3.1                         
# year           2016                        
# month          06                          
# day            21                          
# svn rev        70800                       
# language       R                           
# version.string R version 3.3.1 (2016-06-21)
# nickname       Bug in Your Hair            
Sys.getlocale()
# [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
like image 688
snaut Avatar asked Sep 16 '16 08:09

snaut


2 Answers

The value of as.POSIXct(test) seems to be platform dependent, adding a layer of complexity to getting a reliable method. On my windows machine, (R 3.3.1), as.POSIXct(test) produces NA, as also reported by OP. However, on my Linux platform (same R version), I get the following:

times = c ("2015-03-29 01:00",
           "2015-03-29 02:00",
           "2015-03-29 03:00")

test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")

test
#[1] "2015-03-29 01:00:00 CET"  "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
as.POSIXct(test)
#[1] "2015-03-29 01:00:00 CET"  "2015-03-29 01:00:00 CET"  "2015-03-29 03:00:00 CEST"
as.character(test)
#[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
as.character(as.POSIXct(test))
#[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"

The one thing that we can rely on is not the actual value of as.POSIXct(test), but that it will be different from test when test is an invalid date/time:

(as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
# TRUE FALSE  TRUE

I'm not sure that as.character is strictly necessary here, but I include it just to ensure that we don't fall foul of any other odd behaviours of POSIX objects.

like image 193
dww Avatar answered Nov 10 '22 00:11

dww


The manual says that strptime does not validate whether times exist in specific time zone because of the transition to/from daylight savings (?strptime). Also the manual says that as.POSIXct does this validation, so following the manual, one should check the resulting POSIXct object for NA (?asPOSIXct), which would identify non-existent time as shown in the question example. The result is however OS-specific for times that exist twice in a time zone (?asPOSIXct):

Remember that in most time zones some times do not occur and some occur twice because of transitions to/from ‘daylight saving’ (also known as ‘summer’) time. strptime does not validate such times (it does not assume a specific time zone), but conversion by as.POSIXct will do so.

and

One issue is what happens at transitions to and from DST, for example in the UK

as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")) as.POSIXct(strptime("2010-10-31 01:30:00", "%Y-%m-%d %H:%M:%S"))

are respectively invalid (the clocks went forward at 1:00 GMT to 2:00 BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What happens in such cases is OS-specific: one should expect the first to be ‘NA’, but the second could be interpreted as either BST or GMT (and common OSes give both possible values).

like image 34
Tomas Kalibera Avatar answered Nov 10 '22 01:11

Tomas Kalibera