Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Dates with the data.table package

I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.

What I do quite often with plyr is to split my data by a column containing dates and do some calculations:

library(plyr)
DF <-  data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})

However, using a column with the Date-format does not seem to work in data.table:

library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, Date) : Column 'Date' cannot be auto converted to integer without losing information.

If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?

sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.6.3 zoo_1.7-2        lubridate_0.2.5  ggplot2_0.8.9    proto_0.3-9.2    reshape_0.8.4   
[7] reshape2_1.1     xtable_1.5-6     plyr_1.5.2      

loaded via a namespace (and not attached):
[1] digest_0.5.0    lattice_0.19-30 stringr_0.5     tools_2.13.1 
like image 480
Christoph_J Avatar asked Aug 08 '11 07:08

Christoph_J


1 Answers

This should work:

DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
                 y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)

The data.table package contains some date/time classes with integer storage mode. See ?IDateTime:

Date and time classes with integer storage for fast sorting and grouping. Still experimental!

  • IDate is a date class derived from Date. It has the same internal representation as the Date class, except the storage mode is integer.
  • ITime is a time-of-day class stored as the integer number of seconds in the day. as.ITime does not allow days longer than 24 hours. Because ITime is stored in seconds, you can add it to a POSIXct object, but you should not add it to a Date object.
  • IDateTime takes a date-time input and returns a data table with columns date and time.
like image 97
rcs Avatar answered Oct 07 '22 04:10

rcs