I'm trying to subset a dataset by selecting some columns from a data.table. However, my code does not work with some variations.
Here is a sample data.table
library(data.table)
DT <- data.table( ID = 1:50,
Capacity = sample(100:1000, size = 50, replace = F),
Code = sample(LETTERS[1:4], 50, replace = T),
State = rep(c("Alabama","Indiana","Texas","Nevada"), 50))
Here is a working subset code, where a numeric sequence of columns is specified using ::
DT[ , 1:2]
However, specifying the same sequence of columns using seq does not work:
DT[ , seq(1:2)]
Note that this works with a dataframe but not with a data.table.
I need something along the lines of the second format because I'm subsetting based on the output of grep() and it gives the same output as the second format. What am I doing incorrectly?
Thanks!
On recent versions of data.table, numbers can be used in j to specify columns. This behaviour includes formats such as DT[,1:2] to specify a numeric range of columns. (Note that this syntax does not work on older versions of data.table).
So why does DT[ , 1:2] work, but DT[ , seq(1:2)] does not? The answer is buried in the code for data.table:::[.data.table, which includes the lines:
if (!missing(j)) {
jsub = replace_dot_alias(substitute(j))
root = if (is.call(jsub))
as.character(jsub[[1L]])[1L]
else ""
if (root == ":" || (root %chin% c("-", "!") && is.call(jsub[[2L]]) &&
jsub[[2L]][[1L]] == "(" && is.call(jsub[[2L]][[2L]]) &&
jsub[[2L]][[2L]][[1L]] == ":") || (!length(all.vars(jsub)) &&
root %chin% c("", "c", "paste", "paste0", "-", "!") &&
missing(by))) {
with = FALSE
}
We can see here that data.table is automatically setting the with = FALSE parameter for you when it detects the use of function : in j. It doesn't have the same functionality built in for seq, so we have to specify with = FALSE ourselves if we want to use the seq syntax.
DT[ , seq(1:2), with = FALSE]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With