Why data.table CJ doesn't respect column major order

Question

This is a curiosity more than a question, but I was wondering why data.table CJ function returns an object with the rightmost index running faster (as opposite as base expand.grid function).

An example:

CJ(a=letters[1:2],b=LETTERS[1:2])
#   a b
#1: a A
#2: a B
#3: b A
#4: b B
expand.grid(a=letters[1:2],b=LETTERS[1:2])
#  a b
#1 a A
#2 b A
#3 a B
#4 b B

I think that the leftmost index running faster is more R-ish. Is there a reason for CJ to follow the other order?

eddi · Accepted Answer

It's convenient to have the result of CJ sorted like that, as it can then be keyed by all of the columns, which it is, which then enables operations like this:

dt = data.table(a = c(1,2,1), b = 1:3, c = c('a', 'a', 'b'))
setkey(dt, a, c)
#   a b c
#1: 1 1 a
#2: 1 3 b
#3: 2 2 a

dt[CJ(unique(a), unique(c))]
#   a  b c
#1: 1  1 a
#2: 1  3 b
#3: 2  2 a
#4: 2 NA b

# just checking the key:
key(dt[, CJ(unique(a), unique(c))])
#[1] "V1" "V2"

Why data.table CJ doesn't respect column major order

Tags:

r

data.table

nicola

1 Answers

eddi

Recent Activity

Donate For Us

Why data.table CJ doesn't respect column major order

Tags:

r

data.table

nicola

1 Answers

eddi

Related questions

Recent Activity

Donate For Us