Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select / assign to data.table when variable names are stored in a character vector

Tags:

r

data.table

How do you refer to variables in a data.table if the variable names are stored in a character vector? For instance, this works for a data.frame:

df <- data.frame(col1 = 1:3) colname <- "col1" df[colname] <- 4:6 df #   col1 # 1    4 # 2    5 # 3    6 

How can I perform this same operation for a data.table, either with or without := notation? The obvious thing of dt[ , list(colname)] doesn't work (nor did I expect it to).

like image 911
frankc Avatar asked Sep 12 '12 15:09

frankc


People also ask

What does setDT do in R?

setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.

How do I add a column to a Datatable in R?

A column can be added to an existing data table using := operator. Here ':' represents the fixed values and '=' represents the assignment of values.

How do I add data to a table in R?

To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.


1 Answers

Two ways to programmatically select variable(s):

  1. with = FALSE:

     DT = data.table(col1 = 1:3)  colname = "col1"  DT[, colname, with = FALSE]   #    col1  # 1:    1  # 2:    2  # 3:    3 
  2. 'dot dot' (..) prefix:

     DT[, ..colname]      #    col1  # 1:    1  # 2:    2  # 3:    3 

For further description of the 'dot dot' (..) notation, see New Features in 1.10.2 (it is currently not described in help text).

To assign to variable(s), wrap the LHS of := in parentheses:

DT[, (colname) := 4:6]     #    col1 # 1:    4 # 2:    5 # 3:    6 

The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i was present, it would subassign by reference. The parens around (colname) is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:

Using with = FALSE with := is now deprecated in all cases, given that wrapping the LHS of := with parentheses has been preferred for some time.

colVar = "col1" 
DT[, (colVar) := 1]                             # please change to this DT[, c("col1", "col2") := 1]                    # no change DT[, 2:4 := 1]                                  # no change DT[, c("col1","col2") := list(sum(a), mean(b))]  # no change DT[, `:=`(...), by = ...]                       # no change 

See also Details section in ?`:=`:

DT[i, (colnamevector) := value] # [...] The parens are enough to stop the LHS being a symbol 

And to answer further question in comment, here's one way (as usual there are many ways) :

DT[, colname := cumsum(get(colname)), with = FALSE] #    col1 # 1:    4 # 2:    9 # 3:   15  

or, you might find it easier to read, write and debug just to eval a paste, similar to constructing a dynamic SQL statement to send to a server :

expr = paste0("DT[,",colname,":=cumsum(",colname,")]") expr # [1] "DT[,col1:=cumsum(col1)]"  eval(parse(text=expr)) #    col1 # 1:    4 # 2:   13 # 3:   28 

If you do that a lot, you can define a helper function EVAL :

EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))  EVAL("DT[,",colname,":=cumsum(",colname,")]") #    col1 # 1:    4 # 2:   17 # 3:   45 

Now that data.table 1.8.2 automatically optimizes j for efficiency, it may be preferable to use the eval method. The get() in j prevents some optimizations, for example.

Or, there is set(). A low overhead, functional form of :=, which would be fine here. See ?set.

set(DT, j = colname, value = cumsum(DT[[colname]])) DT #    col1 # 1:    4 # 2:   21 # 3:   66 
like image 59
Matt Dowle Avatar answered Oct 20 '22 04:10

Matt Dowle