I'm trying to learn data.table
package in R
. I have a data table named DT1
and a data frame DF1
, and I'd like to subset some instances according to a logical condition (disjunction). This is my code for now:
DF1[DF1$c1==0 | DF1$c2==1,] #the data.frame way with the data.frame DF1
DT1[DT1$c1==0 | DT1$c2==1,] #the data.frame way with the data.table DT1
On page 5 of "Introduction to the data.table package in R", the author gives an example of something similar but with a conjuction (replace |
by &
in the second line above) and remarks that's a bad use of data.table
package. He suggests todo it this way instead:
setkey(DT1,c1,c2)
DT1[J(0,1)]
So, my question is: How can I write the disjunction condition with the data.table
package syntax? Is it a misuse my second line DT1[DT1$c1==0 | DT1$c2==1,]
? Is there an equivalent to the J
but for disjunction?
Method 1 : Using setDT() method The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure.
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
That document indicates that you could have used:
DT1[c1==0 | c2==1, ]
Here is another solution:
grpsize = ceiling(1e7/26^2)
DT <- data.table(
x=rep(LETTERS,each=26*grpsize),
y=rep(letters,each=grpsize),
v=runif(grpsize*26^2))
setkey(DT, x)
system.time(DT1 <- DT[x=="A" | x=="Z"])
user system elapsed
0.68 0.05 0.74
system.time(DT2 <- DT[J(c("A", "Z"))])
user system elapsed
0.08 0.00 0.07
all.equal(DT1[, v], DT2[, v])
TRUE
Note that I took the example from the data.table document. The only difference is that I do not convert the letters into factors anymore because character keys are now allowed (see NEWS for v 1.8.0).
A short explanation: J
is just short for data.table
. So if you call J(0, 1)
you create a data.table
with two columns that match, just like in the example:
> J(0,1)
V1 V2
[1,] 0 1
You, however, want to match two different elements in one column. Therefore, you need a data.table
with one column. So just add c()
.
J(c(0,1))
V1
[1,] 0
[2,] 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With