Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column name equal to variable name [duplicate]

Tags:

r

data.table

How can I handle this case?

d <- data.table(
  a = c(1, 2, 3)
)
a <- 2
d[a == a]

Gives:

   a
1: 1
2: 2
3: 3

Expected result is:

   a
1: 2
like image 679
Artem Klevtsov Avatar asked Sep 09 '19 14:09

Artem Klevtsov


2 Answers

I've inadvertently fallen prey to this, too. This is about scoping in general, not just data.table (dplyr will do the same thing). You can find similar problems in a function body by defining a variable that masks the same-named variable in the parent (or global) environment.

a <- 1
# a brain-dead example
myfunc <- function() {
  a <- 2
  a - a
}

In that function example, it's "obvious" that there is complete ambiguity in which a is which, but it is effectively the same thing in your question. At no point is it safe to assume that the LHS or RHS of the inequality is always inside the data, and the other is always outside; the search begins inside and moves outside if nothing is found. Because it is found inside on both sides of ==, both LHS and RHS use 'inside'.

I have wondered if it would be difficult to say "if it is found 'inside' on the LHS, then do not look 'inside' on the RHS", but I think that adds unnecessary complexity. Especially since searching for the variable is effectively done in R's normal variable-search, not necessarily manual code within the package-code.

Best answer? Don't do it. Workaround? One of:

# this "should" always work, assuming "where" the external variable is defined
d[a == get("a", envir=parent.frame())]

# this only works if it is truly in the global environment
d[a == get("a", envir=globalenv())]

# and @nicola's suggestion:
d[a == evalq(a, envir=parent.frame())]
like image 71
r2evans Avatar answered Oct 21 '22 19:10

r2evans


There are two alternative solutions to this issue: The .. symbol prefix and using setkey().

For testing, a more sophisticated sample dataset is used where the value 2 of a is not located in row 2:

library(data.table)
d <- data.table(rn = 1:4, a = c(1, 4, 3, 2)) # a more subtle test case
a <- 2

d
   rn a
1:  1 1
2:  2 4
3:  3 3
4:  4 2

The .. symbol prefix

According to data.table NEWS on version 1.10.2 (Jan 2017):

When j is a symbol prefixed with .. it will be looked up in calling scope and its value taken to be column names or numbers.

Unfortunately, this currently is only available for the j= parameter but not for the i= parameter. (According to data.table NEWS on version 1.11.0, May 2018, new features item 18 this might be expanded to symbols appearing in i= and by=, too.)

However, we can use this to create a vector of logical values or a vector of indices which can be used for subsequent subsetting:

library(magrittr)
d[, a == ..a] %>% d[.]
   rn a
1:  4 2

or

d[, .I[a == ..a]] %>% d[.]
   rn a
1:  4 2

For subsetting, magrittr style piping is used as the .. symbol prefix is not yet implemented to be used in i=. Thus, using d[, a == ..a] directly as i= parameter, i.e., d[i = d[, a == ..a]] will not work (it returns d unsubsetted).

setkey()

For the less general use case of filtering d by the local variable a, setkey() can be used:

setkey(d, a)
d[a]
   rn a
1:  4 2

get("a", pos = 1L)

get("a", pos = 1L)] or even shorter get("a", 1L) is an abbreviation of get("a", envir=parent.frame() which already has been suggested in r2evans' answer.

get("a", 1L) is somehow equivalent to ..a but is more robust as it can be used directly in i=:

d[a == get("a", 1L)]
   rn a
1:  4 2
like image 26
Uwe Avatar answered Oct 21 '22 17:10

Uwe