I was playing around with data.table and I came across a distinction that I'm not sure I quite understand. Given the following dataset:
library(data.table)
set.seed(400)
DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT
Can you please explain to me the difference between the following expressions?
1) DT[J("E"), .I]
2) DT[ , .I[x == "E"] ]
3) DT[x == "E", .I]
Indicates the rows on which the values must be updated with. If not provided, implies all rows. The := form is more powerful as it allows subsets and joins based add/update columns by reference.
Think of .N as a variable for the number of instances. For example: dt <- data.table(a = LETTERS[c(1,1:3)], b = 4:7) dt[.N] # returns the last row # a b # 1: C 7.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
set.seed(400)
library(data.table)
DT <- data.table(x = sample(LETTERS[1:5], 20, TRUE), key = "x"); DT
1)
DT[ , .I[x == "E"] ] # [1] 18 19 20
is a data.table where .I is a vector representing the row number of E in the ORIGINAL dataset DT
2)
DT[J("E") , .I] # [1] 1 2 3
DT["E" , .I] # [1] 1 2 3
DT[x == "E", .I] # [1] 1 2 3
are all the same, producing a vector where .Is are vectors representing the row numbers of the Es in the NEW subsetted data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With