Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table does not play well with checkUsage

Tags:

r

data.table

data.table is a wonderful package, which, alas, generates unwarranted warnings from checkUsage (the code comes from here and here):

> library(compiler)
> compiler::enableJIT(3)
> dt <- data.table(a = c(rep(3, 5), rep(4, 5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}
> checkUsage(my.func)
<anonymous>: no visible binding for global variable ‘.SD’ (:2)
<anonymous>: no visible binding for global variable ‘a’ (:2)
<anonymous>: no visible binding for global variable ‘count’ (:3)
<anonymous>: no visible binding for global variable ‘.N’ (:3)
<anonymous>: no visible binding for global variable ‘a’ (:3)
> my.func(dt)
Note: no visible binding for global variable '.SD' 
Note: no visible binding for global variable 'a' 
Note: no visible binding for global variable 'count' 
Note: no visible binding for global variable '.N' 
Note: no visible binding for global variable 'a' 
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5

The warnings about a can be avoided by replacing by=a with by="a", but how do I deal with the other 3 warnings?

This matters to me because these warnings clutter the screen and obscure legitimate warnings. Since the warnings are issued on my.func invocation (when JIT compiler is enabled), not just by checkUsage, I am inclined to call this a bug.

like image 376
sds Avatar asked Apr 23 '13 14:04

sds


2 Answers

UPDATE : Now resolved in v1.8.11. From NEWS :

.SD,.N,.I,.GRP and .BY are now exported (as NULL). So that NOTEs aren't produced for them by R CMD check or codetools::checkUsage via compiler::enableJIT(). utils::globalVariables() was considered, but exporting chosen. Thanks to Sam Steingold for raising, #2723.

And to resolve the notes for the column name symbols count and a, they can both be wrapped with quotes (even on the LHS of :=). Using a fresh R session (since the notes were first time only) the following now produces no notes.

$ R
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> require(data.table)
Loading required package: data.table
data.table 1.8.11  For help type: help("data.table")
> library(compiler)
> compiler::enableJIT(3)
[1] 0
> dt <- data.table(a=c(rep(3,5),rep(4,5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = "a"]
  dt.out[, "count" := dt[, .N, by="a"]$N]
  dt.out
}
> my.func(dt)
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5
> checkUsage(my.func)
> 
like image 132
Matt Dowle Avatar answered Oct 29 '22 21:10

Matt Dowle


It appears that the only way at this time is

my.func <- function (dt) {
  .SD <- .N <- count <- a <- NULL  # avoid inappropriate warnings
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}

i.e., to bind locally the variables reported as unbound globals.

Thanks to @GSee for the links.

like image 25
sds Avatar answered Oct 29 '22 21:10

sds