Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you translate this into data.table package language in R?

Tags:

r

data.table

I'm trying to learn data.table package in R. I have a data table named DT1 and a data frame DF1, and I'd like to subset some instances according to a logical condition (disjunction). This is my code for now:

DF1[DF1$c1==0 | DF1$c2==1,] #the data.frame way with the data.frame DF1
DT1[DT1$c1==0 | DT1$c2==1,] #the data.frame way with the data.table DT1

On page 5 of "Introduction to the data.table package in R", the author gives an example of something similar but with a conjuction (replace | by & in the second line above) and remarks that's a bad use of data.table package. He suggests todo it this way instead:

setkey(DT1,c1,c2)
DT1[J(0,1)]

So, my question is: How can I write the disjunction condition with the data.table package syntax? Is it a misuse my second line DT1[DT1$c1==0 | DT1$c2==1,]? Is there an equivalent to the J but for disjunction?

like image 790
nhern121 Avatar asked May 21 '12 18:05

nhern121


People also ask

How do I convert data to a table in R?

Method 1 : Using setDT() method The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure.

What is the data table package in R?

Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.


2 Answers

That document indicates that you could have used:

DT1[c1==0 | c2==1, ]
like image 144
IRTFM Avatar answered Sep 27 '22 22:09

IRTFM


Here is another solution:

grpsize = ceiling(1e7/26^2)
DT <- data.table(
  x=rep(LETTERS,each=26*grpsize),
  y=rep(letters,each=grpsize),
  v=runif(grpsize*26^2))

setkey(DT, x)
system.time(DT1 <- DT[x=="A" | x=="Z"])
   user  system elapsed 
   0.68    0.05    0.74 
system.time(DT2 <- DT[J(c("A", "Z"))])
   user  system elapsed 
   0.08    0.00    0.07 
all.equal(DT1[, v], DT2[, v])
TRUE

Note that I took the example from the data.table document. The only difference is that I do not convert the letters into factors anymore because character keys are now allowed (see NEWS for v 1.8.0).

A short explanation: J is just short for data.table. So if you call J(0, 1) you create a data.table with two columns that match, just like in the example:

> J(0,1)
     V1 V2
[1,]  0  1

You, however, want to match two different elements in one column. Therefore, you need a data.table with one column. So just add c().

J(c(0,1))
     V1
[1,]  0
[2,]  1
like image 24
Christoph_J Avatar answered Sep 27 '22 23:09

Christoph_J