Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - data.table 1.10.0 - why does a named column index value not work while a integer column index value works without with = FALSE

Tags:

r

data.table

I am using data.table 1.10.0.

# install.packages("install.load") # install in order to use the load_package function
install.load::load_package("data.table", "gsubfn", "fpCompare")

# function to convert from fractions and numeric numbers to numeric (decimal)
# Source 1 begins
to_numeric <- function(n) {
    p <- c(if (length(n) == 2) 0, as.numeric(n), 0:1)
    p[1] + p[2] / p[3]
}
# Source 1 ends

Source 1 is Convert a character vector of mixed numbers, fractions, and integers to numeric

max_size_aggr <- 3 / 4

water_nonair <- structure(list(`Slump (in.)` = c("1 to 2", "3 to 4", "6 to 7",
"Approximate amount of entrapped air in nonair- entrained concrete (%)"), `3/8 in.` =
c(350, 385, 410, 3), `1/2 in.` = c(335, 365, 385, 2.5), `3/4 in.` = c(315, 340, 360, 2),
`1 in.` = c(300, 325, 340, 1.5), `1 1/2 in.` = c(275, 300, 315, 1), `2 in.` =
 c(260, 285, 300, 0.5), `3 in.` = c(220, 245, 270, 0.3), `6 in.` = c(190, 210, NA, 0.2)),
 .Names = c("Slump (in.)", "3/8 in.", "1/2 in.",
 "3/4 in.", "1 in.", "1 1/2 in.", "2 in.", "3 in.", "6 in."), row.names = c(NA, -4L),
 class = c("data.table", "data.frame"))

setnames(water_nonair, c("Slump (in.)", "3/8 in.", "1/2 in.", "3/4 in.", "1 in.",
"1 1/2 in.", "2 in.", "3 in.", "6 in."))

water_nonair_col_numeric <- gsub(" in.", "", colnames(water_nonair)[2:ncol(water_nonair)])

water_nonair_col_numeric <- sapply(strapplyc(water_nonair_col_numeric, "\\d+"), to_numeric)
# Source 1

New way (data.table 1.10.0)

water_nonair_column <- which(water_nonair_col_numeric %==% max_size_aggr)+1L
# [1] 4

water_nonair[2, water_nonair_column][[1]]
# [1] 4

Why does the following work when I call out the column index, but the above, also, with a value of 4 does not work?

water_nonair[2, 4][[1]]
# [1] 340

Old way (data.table 1.9.6)

water_nonair[2, which(water_nonair_col_numeric %==% max_size_aggr)+1L, with = FALSE][[1]]
# [1] 340

I removed the with = FALSE from the function after reading the data.table news after the release of version 1.9.8.

like image 697
iembry Avatar asked Dec 09 '16 01:12

iembry


1 Answers

The long note 3 in v1.9.8 NEWS starts :

When j contains no unquoted variable names (whether column names or not), with= is now automatically set to FALSE. Thus ...

But your j does contain an unquoted variable name. In fact, it is solely an unquoted variable name. So that item does not apply to it.

That's what the options(datatable.WhenJisSymbolThenCallingScope=TRUE) was about so you could try out the new feature going forward. Please read that same NEWS item about that again. If you set that option, it will work as you expected it to.

HOWEVER please don't. Because yesterday I changed it and in development that option has now gone. A migration timeline is no longer needed. The new strategy needs no code changes and has no breakage. Please see the new notes in the latest development NEWS for v1.10.1. I won't copy them here to save duplication.

So going forward, when j is a symbol (i.e. an unquoted variable name) you either still need with=FALSE :

water_nonair[2, water_nonair_column, with=FALSE]

or you can use the new .. prefix from v1.10.1 added yesterday :

water_nonair[2, ..water_nonair_column]

Otherwise, if j is a symbol it must be a column name for safety, consistency and backwards compatibility. If not, you'll now get the new more helpful error message :

DT = data.table(a=1:3, b=4:6)
myCols = "b"
DT[,myCols]
Error in `[.data.table`(DT, , myCols) : 
  j (the 2nd argument inside [...]) is a single symbol but column name 
  'myCols' is not found. Perhaps you intended DT[,..myCols] or
  DT[,myCols,with=FALSE]. This difference to data.frame is deliberate 
  and explained in FAQ 1.1.

As mentioned in NEWS, I reran all 313 CRAN and Bioconductor packages that use data.table against data.table v1.10.1 and 2 of them do break with this change. But that is what we want because they do have a bug (the value of j in calling scope is being returned literally which cannot be what was intended). I've informed their maintainers. This is exactly what we wanted to reveal and improve. The other 311 packages all pass with this change. It doesn't rely on test coverage (which is weak for many packages). The new error happens when j is a symbol that isn't a column, whether there's a test for the result or not.

like image 186
Matt Dowle Avatar answered Nov 20 '22 16:11

Matt Dowle