I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.
What I do quite often with plyr is to split my data by a column containing dates and do some calculations:
library(plyr)
DF <- data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})
However, using a column with the Date-format does not seem to work in data.table:
library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, Date) : Column 'Date' cannot be auto converted to integer without losing information.
If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2
loaded via a namespace (and not attached):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1
This should work:
DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
The data.table package contains some date/time classes with integer storage mode.
See ?IDateTime
:
Date and time classes with integer storage for fast sorting and grouping. Still experimental!
IDate
is a date class derived from Date
. It has the same internal representation as the Date
class, except the storage mode is integer. ITime
is a time-of-day class stored as the integer number of seconds in the day. as.ITime
does not allow days longer than 24 hours. Because ITime
is stored in seconds, you can add it to a POSIXct
object, but you should not add it to a Date
object.IDateTime
takes a date-time input and returns a data table with columns date
and time
. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With