I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and na.strings=c("")
works on the read.csv
).
Of course, because the real world is never like the tutorial, some of the times are only dates, so my as.POSIXlt(Dataset$In,format="%m/%d/%Y %H:%M")
returns NA on the "only date no time"s.
na.omit does not remove these lines. so the questions are 2
Why doesn't na.omit work, or how can I get it to work?
Better, How can I convert one column into both Dates and Times (in the posix format) without 2 calls or with some sort of optional parameter in the format string? (or is this even possible?).
This is a sample of the dates and times. I can't share the real file, 1 it's huge, 2 it's PII.
Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00
3,,
4,,
4,8/16/2015 6:00,8/16/2015 20:00
DF <- read.table(text = "Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00", header = TRUE, sep = ",",
stringsAsFactors = FALSE) #set this option during import
DF$In[nchar(DF$In) < 13] <- paste(DF$In[nchar(DF$In) < 13], "0:00")
DF$Out[nchar(DF$Out) < 13] <- paste(DF$Out[nchar(DF$Out) < 13], "0:00")
DF$In <- as.POSIXct(DF$In, format = "%m/%d/%Y %H:%M", tz = "GMT")
DF$Out <- as.POSIXct(DF$Out, format = "%m/%d/%Y %H:%M", tz = "GMT")
# Id In Out
#1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
#2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
#3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
#4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
#5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
#6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
#7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
#8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
#9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
na.omit
doesn't work with POSIXlt
objects because it is documented to "handle vectors, matrices and data frames comprising vectors and matrices (only)." (see help("na.omit")
). And in the strict sense, POSIXlt
objects are not vectors:
unclass(as.POSIXlt(DF$In))
#$sec
#[1] 0 0 0 0 0 0 0 0 0
#
#$min
#[1] 0 4 50 0 0 0 0 0 0
#
#$hour
#[1] 8 8 8 0 0 8 4 9 0
#
#$mday
#[1] 15 16 17 18 15 16 15 16 17
#
#$mon
#[1] 7 7 7 7 7 7 7 7 7
#
#$year
#[1] 115 115 115 115 115 115 115 115 115
#
#$wday
#[1] 6 0 1 2 6 0 6 0 1
#
#$yday
#[1] 226 227 228 229 226 227 226 227 228
#
#$isdst
#[1] 0 0 0 0 0 0 0 0 0
#
#attr(,"tzone")
#[1] "GMT"
There is hardly any reason to prefer POSIXlt
over POSIXct
(which is an integer giving the number of seconds since the origin internally and thus needs less memory).
You've been given a couple of strategies that bring these character values in and process "in-place". I almost never use as.POSIXlt
since there are so many pitfalls in dealing with the list-in-list structures that it returns, especially considering its effective incompatibility with dataframes. Here's a method that does the testing and coercion at the read.
-level by defining an as
-method:
setOldClass("inTime", prototype="POSIXct")
setAs("character", "inTime",
function(from) structure( ifelse( is.na(as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
as.POSIXct(from, format="%m/%d/%Y") ,
as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
class="POSIXct" ) )
read.csv(text=txt, colClasses=c("numeric", 'inTime','inTime') )
Id In Out
1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
The structure
"envelope" is needed because of the rather strange behavior of ifelse, which otherwise would return a numeric object rather than an object of class-'POSIXct'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With