I'm happy to find data.table
has its new release, and got one question about J()
. From data.table
NEWS 1.9.2:
x[J(2), a]
, wherea
is the key column seesa
inj
, #2693 and FAQ 2.8. Also,x[J(2)]
automatically names the columns fromi
using the key columns ofx
. In cases where the key columns ofx
andi
are identical,i
's columns can be referred to by usingi.name
; e.g.,x[J(2), i.a]
There're several questions about J()
in S.O, and also the introduction to data.table
talks about the binary search of J()
. But my understanding of J()
is still not very clear.
All I know is that, if I want to select rows where "b" in column A and "d" in column B:
DT2 <- data.table(A = letters[1:5], B = letters[3:7], C = 1:5)
setkey(DT2, A, B)
DT2[J("b", "d")]
and if I want to select the rows where A = "a" or "c", I code like this
DT2[A == "a" | A == "c"]
much like the data.frame way. (minor question: how to select using a more data.table way?)
So to my understanding, 'J()
only uses in the above case. select two single value from 2 different columns.
Hope my understanding is wrong. There're few documents about J()
. I read How is J() function implemented in data.table?. J(.)
is detected and simply replaced with list(.)
It seems that every case list(.)
can replace J(.)
And back to the question, what the purpose of this new feature? x[J(2), a]
It's really appreciated if you can give some detailed explanations!
.()
and J()
as the function
wrapping the i
argument of data.table
are simply replaced by list()
because [.data.table
does some programming on the language of the i
and j
arguments to optimize how things are done internally. It can be thought of as a alias for list
The reason they are included is to allow save time and effort (3 key strokes!)
If I wanted to select key values 'a'
or 'c'
from the first column of a key I could do
DT[.(c('a','c'))]
# or
DT[J(c('a','c'))]
# or
DT[list(c('a','c'))]
If I wanted A='a' or 'c'
and B = 'd'
then I would could use
DT[.(c('a','c'),'d')]
If I wanted A = 'a' or 'c' and B = 'd' or 'e'
then I would use CJ
(or expand.grid
) to create all combinations
DT[CJ(c('a','c'),c('d','e'))]
The help for J
,SJ
and CJ
is quite well written! See also the vignette Keys and fast binary search based subset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With