Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About the new features J() of data.table 1.9.2

Tags:

r

data.table

I'm happy to find data.table has its new release, and got one question about J(). From data.table NEWS 1.9.2:

x[J(2), a], where a is the key column sees a in j, #2693 and FAQ 2.8. Also, x[J(2)] automatically names the columns from i using the key columns of x. In cases where the key columns of x and i are identical, i's columns can be referred to by using i.name; e.g., x[J(2), i.a]

There're several questions about J() in S.O, and also the introduction to data.table talks about the binary search of J(). But my understanding of J() is still not very clear.

All I know is that, if I want to select rows where "b" in column A and "d" in column B:

DT2 <- data.table(A = letters[1:5], B = letters[3:7], C = 1:5)
setkey(DT2, A, B)
DT2[J("b", "d")]

and if I want to select the rows where A = "a" or "c", I code like this

DT2[A == "a" | A == "c"]

much like the data.frame way. (minor question: how to select using a more data.table way?)

So to my understanding, 'J() only uses in the above case. select two single value from 2 different columns.

Hope my understanding is wrong. There're few documents about J(). I read How is J() function implemented in data.table?. J(.) is detected and simply replaced with list(.)

It seems that every case list(.) can replace J(.)

And back to the question, what the purpose of this new feature? x[J(2), a]

It's really appreciated if you can give some detailed explanations!

like image 549
Bigchao Avatar asked Mar 02 '14 15:03

Bigchao


1 Answers

.() and J() as the function wrapping the i argument of data.table are simply replaced by list() because [.data.table does some programming on the language of the i and j arguments to optimize how things are done internally. It can be thought of as a alias for list

The reason they are included is to allow save time and effort (3 key strokes!)

If I wanted to select key values 'a' or 'c' from the first column of a key I could do

DT[.(c('a','c'))]
# or
DT[J(c('a','c'))]
# or
DT[list(c('a','c'))]

If I wanted A='a' or 'c' and B = 'd' then I would could use

DT[.(c('a','c'),'d')]

If I wanted A = 'a' or 'c' and B = 'd' or 'e' then I would use CJ (or expand.grid) to create all combinations

DT[CJ(c('a','c'),c('d','e'))]

The help for J,SJ and CJ is quite well written! See also the vignette Keys and fast binary search based subset.

like image 182
mnel Avatar answered Oct 24 '22 20:10

mnel