I have the following problem: the date column in data I get contains dates that do not exist due to daylight saving time. (For example 2015-03-29 02:00 does not exist in Central European Time, because the clock gets set directly from 01:59 to 03:00 because DST takes effect on this day)
Is there an easy and reliable way to determine if a date is valid with respect to daylight saving time?
This is not trivial because of the properties of the datetime classes.
# generating the invalid time as POSIXlt object
test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET")
# the object seems to represent something at least partially reasonable, notice the missing timezone specification though
test
# [1] "2015-03-29 02:00:00"
# strangely enough this object is regarded as NA by is.na
is.na(test)
# [1] TRUE
# which is no surprise if you consider:
is.na.POSIXlt
# function (x)
# is.na(as.POSIXct(x))
as.POSIXct(test)
# [1] NA
# inspecting the interior of my POSIXlt object:
unlist(test)
# sec min hour mday mon year wday yday isdst zone gmtoff
# "0" "0" "2" "29" "2" "115" "0" "87" "-1" "" NA
So the simplest way I thought of is to check the isdst
field of the POSIXlt
object, the help for POSIXt
describes the filed as follows:
isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
Is checking the isdst
field save in the sense that this field is only -1
if the date is invalid due to dst-changes or can it be -1
for some other reasons?
Info on version, platform and locale
R.version
# _
# platform x86_64-w64-mingw32
# arch x86_64
# os mingw32
# system x86_64, mingw32
# status
# major 3
# minor 3.1
# year 2016
# month 06
# day 21
# svn rev 70800
# language R
# version.string R version 3.3.1 (2016-06-21)
# nickname Bug in Your Hair
Sys.getlocale()
# [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
The value of as.POSIXct(test)
seems to be platform dependent, adding a layer of complexity to getting a reliable method. On my windows machine, (R 3.3.1), as.POSIXct(test)
produces NA
, as also reported by OP. However, on my Linux platform (same R version), I get the following:
times = c ("2015-03-29 01:00",
"2015-03-29 02:00",
"2015-03-29 03:00")
test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")
test
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
as.POSIXct(test)
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 01:00:00 CET" "2015-03-29 03:00:00 CEST"
as.character(test)
#[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
as.character(as.POSIXct(test))
#[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"
The one thing that we can rely on is not the actual value of as.POSIXct(test)
, but that it will be different from test
when test
is an invalid date/time:
(as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
# TRUE FALSE TRUE
I'm not sure that as.character
is strictly necessary here, but I include it just to ensure that we don't fall foul of any other odd behaviours of POSIX objects.
The manual says that strptime
does not validate whether times exist in specific time zone because of the transition to/from daylight savings (?strptime
). Also the manual says that as.POSIXct
does this validation, so following the manual, one should check the resulting POSIXct object for NA (?asPOSIXct
), which would identify non-existent time as shown in the question example. The result is however OS-specific for times that exist twice in a time zone (?asPOSIXct
):
Remember that in most time zones some times do not occur and some occur twice because of transitions to/from ‘daylight saving’ (also known as ‘summer’) time.
strptime
does not validate such times (it does not assume a specific time zone), but conversion byas.POSIXct
will do so.
and
One issue is what happens at transitions to and from DST, for example in the UK
as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S"))
as.POSIXct(strptime("2010-10-31 01:30:00", "%Y-%m-%d %H:%M:%S"))
are respectively invalid (the clocks went forward at 1:00 GMT to 2:00 BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What happens in such cases is OS-specific: one should expect the first to be ‘NA’, but the second could be interpreted as either BST or GMT (and common OSes give both possible values).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With