This is a curiosity more than a question, but I was wondering why data.table
CJ
function returns an object with the rightmost index running faster (as opposite as base expand.grid
function).
An example:
CJ(a=letters[1:2],b=LETTERS[1:2])
# a b
#1: a A
#2: a B
#3: b A
#4: b B
expand.grid(a=letters[1:2],b=LETTERS[1:2])
# a b
#1 a A
#2 b A
#3 a B
#4 b B
I think that the leftmost index running faster is more R-ish. Is there a reason for CJ
to follow the other order?
It's convenient to have the result of CJ
sorted like that, as it can then be keyed by all of the columns, which it is, which then enables operations like this:
dt = data.table(a = c(1,2,1), b = 1:3, c = c('a', 'a', 'b'))
setkey(dt, a, c)
# a b c
#1: 1 1 a
#2: 1 3 b
#3: 2 2 a
dt[CJ(unique(a), unique(c))]
# a b c
#1: 1 1 a
#2: 1 3 b
#3: 2 2 a
#4: 2 NA b
# just checking the key:
key(dt[, CJ(unique(a), unique(c))])
#[1] "V1" "V2"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With