Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why data.table CJ doesn't respect column major order

Tags:

r

data.table

This is a curiosity more than a question, but I was wondering why data.table CJ function returns an object with the rightmost index running faster (as opposite as base expand.grid function).

An example:

CJ(a=letters[1:2],b=LETTERS[1:2])
#   a b
#1: a A
#2: a B
#3: b A
#4: b B
expand.grid(a=letters[1:2],b=LETTERS[1:2])
#  a b
#1 a A
#2 b A
#3 a B
#4 b B

I think that the leftmost index running faster is more R-ish. Is there a reason for CJ to follow the other order?

like image 489
nicola Avatar asked Feb 01 '15 22:02

nicola


1 Answers

It's convenient to have the result of CJ sorted like that, as it can then be keyed by all of the columns, which it is, which then enables operations like this:

dt = data.table(a = c(1,2,1), b = 1:3, c = c('a', 'a', 'b'))
setkey(dt, a, c)
#   a b c
#1: 1 1 a
#2: 1 3 b
#3: 2 2 a

dt[CJ(unique(a), unique(c))]
#   a  b c
#1: 1  1 a
#2: 1  3 b
#3: 2  2 a
#4: 2 NA b

# just checking the key:
key(dt[, CJ(unique(a), unique(c))])
#[1] "V1" "V2"
like image 65
eddi Avatar answered Nov 15 '22 14:11

eddi