Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table subsetting rows using a logical column: why do I have to explicitly compare with TRUE? [duplicate]

Tags:

r

data.table

I'm wondering why for the given data.table:

library(data.table)
DT <- structure(list(number = 1:5, bmask = c(FALSE, TRUE, FALSE, TRUE, 
FALSE)), .Names = c("number", "bmask"), row.names = c(NA, -5L
), class = c("data.table", "data.frame"))

> DT
   number bmask
1:      1 FALSE
2:      2  TRUE
3:      3 FALSE
4:      4  TRUE
5:      5 FALSE

the expression DT[bmask==T,.(out=number)] works as expected:

   out
1:   2
2:   4

but DT[bmask,.(out=number)] causes error:

> DT[bmask,.(out=number)]
Error in eval(expr, envir, enclos) : object 'bmask' not found

Is it a proper behavior of the data.table package?

like image 282
Marat Talipov Avatar asked Jan 16 '15 17:01

Marat Talipov


People also ask

How does subsetting work in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.

What does the data table () function provide to big data processing?

It provides the efficient data. table object which is a much improved version of the default data. frame . It is super fast and has intuitive and terse syntax.


1 Answers

Use this instead:

DT[(bmask), .(out=number)]
#    out
# 1:   2
# 2:   4

The role of the parentheses is to put the symbol bmask inside of a function call, from whose evaluation environment the columns of the DT will be visible1. Any other function call that simply returns bmask's value (e.g. c(bmask), I(bmask), or bmask==TRUE) or the indices of its true elements (e.g. which(bmask)) will work just as well but may take slightly longer to compute.

If bmask is not located inside a function call, it will be searched for in calling scope (here the global environment), which can also be handy at times. Here's the relevant explanation from ?data.table:

Advanced: When 'i' is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.


1To see that () is itself a function call, type is(`(`).

like image 97
Josh O'Brien Avatar answered Oct 23 '22 07:10

Josh O'Brien