Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table `:=` assignment expressions with dynamic inputs (existing columns) and outputs (new column names)

Tags:

r

data.table

Note: The precise problem I hit in this question does not apply to recent versions of data table. If you want to do something like described in the title, check out the corresponding question in the package FAQ, 1.6 OK, but I don’t know the expressions in advance. How do I programatically pass them in?.

I have seen an answer that illustrates how to construct an expression to be evaluated in

DT[,j=eval(expr)]

I am using this with an assignment, ```:=`(mycol=my_calculation)``, and I'm wondering...

  • How can I assign the name "mycol" dynamically?
  • What is the correct way to let "my_calculation" take a dynamically-determined set of columns?

By "dynamically", I mean "determined after I write the code for my expr".

New example

EDIT: To better illustrate the issue, here is different example. Look in the edit history to see the original.

require(data.table)
require(plyr)
options(datatable.verbose=TRUE)
DT <- CJ(a=0:1,b=0:1,y=2)

# setup:
expr  <- as.quoted(paste(expression(get(col_in_one)+get(col_in_two))))[[1]]

# usage: 
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,(col_out):=eval(expr)] # fails, should take the form j=eval(expr)

I want to keep the setup and usage stages separate, so my code is easier to maintain. My real expression is messier than this example (where it just chooses one column).

Questions

First question: How can I make the assigned-to column, "col_out", dynamic? I mean: I want to specify both "cols_in_*" and "col_out" on the fly.

I have tried creating various expressions in "expr", but as.quoted throws an error about not putting certain stuff to the left of the = symbol.

Second question: How can I avoid the warnings against using get?

The warnings suggest using .SDcols, to let [.data.table know which columns I am using. However, if I use the .SDcols argument, another warning says there's no point doing that unless .SD is being used.

Tentative solution

The solutions I have so far are...

# Ricardo + eddi:
expr2 <- as.quoted(paste(expression(`:=`(
  Vtmp=.SD[[col_in_one]]+.SD[[col_in_two]]))))[[1]]

# usage
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,eval(expr2),.SDcols=c(col_in_one,col_in_two)]
setnames(DT,'Vtmp',col_out)

This still involves the minor annoyance of doing the operation in two steps and keeping track of "Vtmp", so the first question is still partly open.

like image 383
Frank Avatar asked Oct 09 '13 15:10

Frank


2 Answers

Maybe I don't understand the problem well, but does this suffice:

DT[, (col_out) := .SD[[col_in_one]]+.SD[[col_in_two]],
     .SDcols = c(col_in_one,col_in_two)]
DT
#   a b y bah
#1: 0 0 2   0
#2: 0 1 2   1
#3: 1 0 2   1
#4: 1 1 2   2

To answer the edited question, to get the eval to work, use .SD as environment:

DT[, (col_out) := eval(expr, .SD)]

Also, see this question and the update there - eval and quote in data.table

like image 193
eddi Avatar answered Oct 05 '22 00:10

eddi


The simplest way is to set it AFTER you evaluate expression. Afterall, the time to execute that is constant and nearly 0.

someDummyVar <- "tempColName_XCWF5D"
DT [, (someDummyVar) := eval(expr) ]

setnames(DT, someDummyVar, RealColumnName)

As for question two: Don't turn on verbose warnings and you wont get verbose warnings ;)

options(datatable.verbose=FALSE)

As for Reduce : try posting that as a separate and simplified question so that it is easier to follow what you are doing (outside of the eval issues)

like image 32
Ricardo Saporta Avatar answered Oct 04 '22 23:10

Ricardo Saporta