Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table colClasses conversion to POSIXct

Tags:

r

data.table

Why doesn't the colClasses argument to data.table::fread seem to convert the REQUEST_DATE column to POSIXct in the example below? It converts the ROW_ID column without issue.

library(data.table)

txt <- "ROW_ID,REQUEST_TYPE,REQUEST_DATE
1,OTHER,2009-07-31 07:35:38
2,OTHER,2009-07-30 21:18:35
3,OTHER,2009-07-30 21:18:30
4,OTHER,2009-07-30 21:18:40
5,OTHER,2009-07-30 21:18:39
6,QUERY,2009-07-30 21:19:29
7,OTHER,2009-07-30 21:18:42
8,OTHER,2009-07-30 21:18:45
9,OTHER,2009-07-31 07:35:31
10,OTHER,2009-07-31 07:35:30
"
dt <- fread(txt, colClasses = c(ROW_ID = "character", REQUEST_DATE = "POSIXct"))

The typical conversion also works:

dt[, as.POSIXct(REQUEST_DATE)]
 [1] "2009-07-31 07:35:38 EDT" "2009-07-30 21:18:35 EDT" "2009-07-30 21:18:30 EDT" "2009-07-30 21:18:40 EDT" "2009-07-30 21:18:39 EDT"
 [6] "2009-07-30 21:19:29 EDT" "2009-07-30 21:18:42 EDT" "2009-07-30 21:18:45 EDT" "2009-07-31 07:35:31 EDT" "2009-07-31 07:35:30 EDT"

In this particular case I can't do dt[, REQUEST_DATE := as.POSIXct(REQUEST_DATE)] however because the real data has ~50m rows and many columns. The alternate syntax also doesn't seem to work:

dt <- fread(txt, colClasses = list(POSIXct = "REQUEST_DATE"))

The data.table help for fread says "A character vector of classes (named or unnamed), as read.csv. Or a named list of vectors of column names or numbers, see examples. colClasses in fread is intended for rare overrides, not for routine use. fread will only promote a column to a higher type if colClasses requests it. It won't downgrade a column to a lower type since NAs would result. You have to coerce such columns afterwards yourself, if you really require data loss."

It isn't clear to me that the POSIXct is considered a lower type than character.

I am using data.table version 1.10.0 .

like image 459
ruser9575ba6f Avatar asked Jan 27 '17 21:01

ruser9575ba6f


1 Answers

As Frank mentions in the comments, it looks like this is a current data.table limitation. I ended up using the fastPOSIXct function in the fasttime package. It converts 50m rows in about a dozen seconds on my laptop, which is quite reasonable for my use case.

like image 189
ruser9575ba6f Avatar answered Oct 14 '22 16:10

ruser9575ba6f