Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: Bypass setkey when using monotonic transform of a key variable

Tags:

r

data.table

Is the 'sorted' attribute part of the official data.table API?

I frequently do things like derive a week/month/quarter/year variable from a date variable, which of course is a monotonic transformation. I then do things with by using one of these monotonically-derived variables.

I'm wondering if it is safe to directly replace my date variable with the name of the week/month/etc. variables in the sorted attribute and have things work properly? i.e. is the below safe to do:

library(data.table)
library(lubridate)
DT <- data.table(day=as.Date(c('2006-01-30', '2006-01-31', '2006-02-01', '2006-02-02')),
                 d=1:4, key='day')
DT[, month := floor_date(day, unit='month')]
# is this safe?
attr(DT, 'sorted') <- 'month'

I couldn't figure out if there were some other underlying data structures that reference into the table that might cause problems with this technique.

like image 236
James Avatar asked Mar 24 '14 22:03

James


1 Answers

Yes, I use that trick all the time when I'm sure that the data is sorted, but use setattr instead to avoid a copy:

setattr(DT, 'sorted', 'month')

If you look at the code of setkeyv you'll see that's exactly what it does - sorts the data and then sets the "sorted" attribute.

like image 165
eddi Avatar answered Oct 14 '22 20:10

eddi