Sample data in csv
format. Save in a file broken_posix.csv
Date
3/10/2012 23:00
3/11/2012 0:00
3/11/2012 1:00
3/11/2012 2:00
3/11/2012 3:00
3/11/2012 4:00
3/11/2012 5:00
3/11/2012 6:00
3/11/2012 7:00
3/11/2012 8:00
3/11/2012 9:00
3/11/2012 10:00
3/11/2012 11:00
3/11/2012 12:00
3/11/2012 13:00
3/11/2012 14:00
3/11/2012 15:00
3/11/2012 16:00
3/11/2012 17:00
3/11/2012 18:00
3/11/2012 19:00
3/11/2012 20:00
3/11/2012 21:00
3/11/2012 22:00
3/11/2012 23:00
3/12/2012 0:00
3/12/2012 1:00
3/12/2012 2:00
3/12/2012 3:00
3/12/2012 4:00
3/12/2012 5:00
3/12/2012 6:00
3/12/2012 7:00
3/12/2012 8:00
3/12/2012 9:00
3/12/2012 10:00
3/12/2012 11:00
So I have this file broken_posix.csv
. I can read the file just fine with
a_var <- read.csv("broken_posix.csv")
Then I can convert it to posix
using
a_var_posixct = as.POSIXct(strptime( as.character( a_var$Date) , '%m/%d/%Y %H:%M'))
or with
a_var_posixlt = strptime(as.character( a_var$Date) , '%m/%d/%Y %H:%M')
The problem occurs now though because when I use posixct, then I get 4 NA values in my string every year. When I use posixlt
I get one NA
value on March 11,2012 at 2 (daylight savings time)
You'll see what I mean when you run
which(is.na(a_var_posixct))
which(is.na(a_var_posixlt))
a_var_posixct[4]
a_var_posixlt[4]
The fourth value is always a NA
value whenever you apply an operation even though it is clearly a date value for posixlt.
I've tried omitting the value only to end up messing up the rest of the posix string. I've tried setting the posix string as itself, in an attempt to clear the NA flag, to no effect. I've even tried setting it as a character value only to lose the hour and minute formatting.
I think that this situation occurs because of daylight savings time. It's very frustrating to deal with because when I try to run other functions on the dates I have to try to avoid the NA values since I can't change them. I could aggregate the data by day, and or just use date objects but that doesn't seem like the right method.
Using a time zone without daylight saving time fixes this kind of problems for me.
a_var_posixlt = strptime(as.character( a_var$Date) , '%m/%d/%Y %H:%M',tz="GMT")
from ?as.POSIXct
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct". Any conversion that needs to go between the two date-time classes requires a timezone: conversion from "POSIXlt" to "POSIXct" will validate times in the selected timezone. One issue is what happens at transitions to and from DST, for example in the UK
as.POSIXct(strptime('2011-03-27 01:30:00', '%Y-%m-%d %H:%M:%S'))
as.POSIXct(strptime('2010-10-31 01:30:00', '%Y-%m-%d %H:%M:%S'))
are respectively invalid (the clocks went forward at 1:00 GMT to 2:00 BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What happens in such cases is OS-specific: one should expect the first to be NA, but the second could be interpreted as either BST or GMT (and common OSes give both possible values). Note too (see strftime), OS facilities may not format invalid times correctly.
Your 4 NA's will presumably be on the hour when the clocks change twice a year.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With