Would someone please explain to me the correct usage of .I
for returning the row numbers of a data.table?
I have data like this:
require(data.table) DT <- data.table(X=c(5, 15, 20, 25, 30)) DT # X # 1: 5 # 2: 15 # 3: 20 # 4: 25 # 5: 30
I want to return a vector of row indices where a condition in i
is TRUE
, e.g. which rows have an X
greater than 20.
DT[X > 20] # rows 4 & 5 are greater than 20
To get the indices, I tried:
DT[X > 20, .I] # [1] 1 2
...but clearly I am doing it wrong, because that simply returns a vector containing 1 to the number of returned rows. (Which I thought was pretty much what .N
was for?).
Sorry if this seems extremely basic, but all I have been able to find in the data.table documentation is WHAT .I
and .N
do, not HOW to use them.
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
With command rbindlist from the data. table package, we can append dt_add_row and new_row row-wise. Object dt_add_row, shown in Table 2, shows the original data. table with the added row number 6.
Description. Working with rows is a fundamental part of DataTables, and you want to be able to easily select the rows that you want from the table. This method is the row counterpart to the columns() and cells() methods for working with columns and cells in the table, respectively.
If all you want is the row numbers rather than the rows themselves, then use which = TRUE
, not .I
.
DT[X > 20, which = TRUE] # [1] 4 5
That way you get the benefits of optimization of i
, for example fast joins or using an automatic index. The which = TRUE
makes it return early with just the row numbers.
Here's the manual entry for the which
argument inside data.table :
TRUE
returns the row numbers ofx
thati
matches to. IfNA
, returns the row numbers ofi
that have no match inx
. By defaultFALSE
and the rows inx
that match are returned.
Notice there is a specific relationship between .I
and the i = ..
argument in DT[i = .., j = .., by = ..]
Namely, .I
is a vector of row numbers of the subsetted table.
### Lets create some sample data set.seed(1) LL <- sample(LETTERS[1:5], 20, TRUE) DT <- data.table(X=LL)
.I
DT[X == "B", .I] # [1] 1 2 3 4 5 6 DT[ , .I[X == "B"] ] # [1] 1 2 5 11 14 19
Sorry if this seems extremely basic, but all I have been able to find in the data.table documentation is WHAT .I and .N do, not HOW to use them.
First let's check the documentation. I typed ?data.table
and searched for .I
. Here's what's there :
Advanced: When grouping, symbols .SD, .BY, .N, .I and .GRP may be used in the j expression, defined as follows.
.I is an integer vector equal to seq_len(nrow(x)). While grouping, it holds for each item in the group its row location in x. This is useful to subset in j; e.g. DT[, .I[which.max(somecol)], by=grp].
Emphasis added by me here. The original intention was for .I
to be used while grouping. Note that there is in fact an example there in the documentation of HOW to use .I
.
You aren't grouping.
That said, what you tried was reasonable. Over time these symbols have become to be used when not grouping as well. There might be a case that .I
should return what you expected. I can see that using .I
in j
together with both i
and by
could be useful. Currently .I
doesn't seem helpful when i
is present, as you pointed out.
Using the which()
function is good but might then circumvent optimization in i
(which()
needs a long logical input which has to be created and passed to it). Using the which=TRUE
argument is good but then just returns the row numbers (you couldn't then do something with those row numbers in j
by group).
Feature request #1494 filed to discuss changing .I
to work the way you expected. The documentation does contain the words "its row location in x" which would imply what you expected since x
is the whole data.table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With