Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding .I in data.table in R

I was playing around with data.table and I came across a distinction that I'm not sure I quite understand. Given the following dataset:

library(data.table)

set.seed(400)
DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT

Can you please explain to me the difference between the following expressions?

1) DT[J("E"), .I]

2) DT[ , .I[x == "E"] ]

3) DT[x == "E", .I]

like image 615
black_sheep07 Avatar asked May 10 '14 20:05

black_sheep07


People also ask

What does := do in data table?

Indicates the rows on which the values must be updated with. If not provided, implies all rows. The := form is more powerful as it allows subsets and joins based add/update columns by reference.

What is .n in data table?

Think of .N as a variable for the number of instances. For example: dt <- data.table(a = LETTERS[c(1,1:3)], b = 4:7) dt[.N] # returns the last row # a b # 1: C 7.

Is data table DT == true?

data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.


1 Answers

set.seed(400)
library(data.table)

DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT

1)

DT[  , .I[x == "E"] ] # [1] 18 19 20

is a data.table where .I is a vector representing the row number of E in the ORIGINAL dataset DT

2)

DT[J("E")  , .I]   # [1] 1 2 3

DT["E"     , .I]   # [1] 1 2 3

DT[x == "E", .I]   # [1] 1 2 3

are all the same, producing a vector where .Is are vectors representing the row numbers of the Es in the NEW subsetted data

like image 114
Ragy Isaac Avatar answered Oct 11 '22 05:10

Ragy Isaac