Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I clear an NA flag for a posix value?

Tags:

r

Sample data in csv format. Save in a file broken_posix.csv

Date
3/10/2012 23:00
3/11/2012 0:00
3/11/2012 1:00
3/11/2012 2:00
3/11/2012 3:00
3/11/2012 4:00
3/11/2012 5:00
3/11/2012 6:00
3/11/2012 7:00
3/11/2012 8:00
3/11/2012 9:00
3/11/2012 10:00
3/11/2012 11:00
3/11/2012 12:00
3/11/2012 13:00
3/11/2012 14:00
3/11/2012 15:00
3/11/2012 16:00
3/11/2012 17:00
3/11/2012 18:00
3/11/2012 19:00
3/11/2012 20:00
3/11/2012 21:00
3/11/2012 22:00
3/11/2012 23:00
3/12/2012 0:00
3/12/2012 1:00
3/12/2012 2:00
3/12/2012 3:00
3/12/2012 4:00
3/12/2012 5:00
3/12/2012 6:00
3/12/2012 7:00
3/12/2012 8:00
3/12/2012 9:00
3/12/2012 10:00
3/12/2012 11:00

So I have this file broken_posix.csv. I can read the file just fine with

a_var <- read.csv("broken_posix.csv")

Then I can convert it to posix using

a_var_posixct = as.POSIXct(strptime( as.character( a_var$Date) , '%m/%d/%Y %H:%M'))

or with

a_var_posixlt = strptime(as.character( a_var$Date) , '%m/%d/%Y %H:%M')

The problem occurs now though because when I use posixct, then I get 4 NA values in my string every year. When I use posixlt I get one NA value on March 11,2012 at 2 (daylight savings time)

You'll see what I mean when you run

which(is.na(a_var_posixct))
which(is.na(a_var_posixlt))

a_var_posixct[4]
a_var_posixlt[4]

The fourth value is always a NA value whenever you apply an operation even though it is clearly a date value for posixlt.

I've tried omitting the value only to end up messing up the rest of the posix string. I've tried setting the posix string as itself, in an attempt to clear the NA flag, to no effect. I've even tried setting it as a character value only to lose the hour and minute formatting.

I think that this situation occurs because of daylight savings time. It's very frustrating to deal with because when I try to run other functions on the dates I have to try to avoid the NA values since I can't change them. I could aggregate the data by day, and or just use date objects but that doesn't seem like the right method.

like image 845
obesechicken13 Avatar asked Jul 17 '12 19:07

obesechicken13


2 Answers

Using a time zone without daylight saving time fixes this kind of problems for me.

a_var_posixlt = strptime(as.character( a_var$Date) , '%m/%d/%Y %H:%M',tz="GMT")
like image 114
Roland Avatar answered Nov 05 '22 02:11

Roland


from ?as.POSIXct

Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct". Any conversion that needs to go between the two date-time classes requires a timezone: conversion from "POSIXlt" to "POSIXct" will validate times in the selected timezone. One issue is what happens at transitions to and from DST, for example in the UK

as.POSIXct(strptime('2011-03-27 01:30:00', '%Y-%m-%d %H:%M:%S'))
as.POSIXct(strptime('2010-10-31 01:30:00', '%Y-%m-%d %H:%M:%S'))

are respectively invalid (the clocks went forward at 1:00 GMT to 2:00 BST) and ambiguous (the clocks went back at 2:00 BST to 1:00 GMT). What happens in such cases is OS-specific: one should expect the first to be NA, but the second could be interpreted as either BST or GMT (and common OSes give both possible values). Note too (see strftime), OS facilities may not format invalid times correctly.

Your 4 NA's will presumably be on the hour when the clocks change twice a year.

like image 2
shhhhimhuntingrabbits Avatar answered Nov 05 '22 02:11

shhhhimhuntingrabbits