Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic column names in data.table

Tags:

r

data.table

I am trying to add columns to my data.table, where the names are dynamic. I addition I need to use the by argument when adding these columns. For example:

test_dtb <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10)) cn <- parse(text = "blah") test_dtb[ , eval(cn) := mean(a), by = id]  # Error in `[.data.table`(test_dtb, , `:=`(eval(cn), mean(a)), by = id) :  #  LHS of := must be a single column name when with=TRUE. When with=FALSE the LHS may be a vector of column names or positions. 

Another attempt:

cn <- "blah" test_dtb[ , cn := mean(a), by = id, with = FALSE] # Error in `[.data.table`(test_dtb, , `:=`(cn, mean(a)), by = id, with = FALSE) : 'with' must be TRUE when 'by' or 'keyby' is provided 

Update from Matthew:

This now works in v1.8.3 on R-Forge. Thanks for highlighting!
See this similar question for new examples:

Assign multiple columns using data.table, by group

like image 225
Alex Avatar asked Jul 31 '12 16:07

Alex


People also ask

How do you change a column name into a dynamic data table?

You can change the the column title, not internal name, using this code: fd. control('DataTable1'). ready().

How do I get column names dynamically in SQL?

Solution 1. The only way to do that is use build your command into a string, and use EXEC to run the result: table and column name parsing is conducted early in the SQL command execution process and have been replaced before any of the actual query is executed.

What is dynamic column?

Dynamic columns is a feature that allows one to store different sets of columns for each row in a table. It works by storing a set of columns in a blob and having a small set of functions to manipulate it.


2 Answers

From data.table 1.9.4, you can just do this:

## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out test_dtb[, (cn) := mean(a), by = id] head(test_dtb, 4) #     a  b id blah # 1: 41 19  1 54.2 # 2:  4 99  2 50.0 # 3: 49 85  3 46.7 # 4: 61  4  4 57.1 

See Details in ?:=:

DT[i, (colvector) := val]

[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector)


Original answer:

You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j argument (rather than just its left hand side)?

Something like this should do the trick:

## Your code so far library(data.table) test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10)) cn <- "blah"  ## One solution expr <- parse(text = paste0(cn, ":=mean(a)")) test_dtb[,eval(expr), by=id]  ## Checking the result head(test_dtb, 4) #     a  b id blah # 1: 30 26  1 38.4 # 2: 83 82  2 47.4 # 3: 47 66  3 39.5 # 4: 87 23  4 65.2 
like image 119
Josh O'Brien Avatar answered Sep 29 '22 22:09

Josh O'Brien


Expression can be constructed with bquote.

cn <- "blah" expr <- bquote(.(as.name(cn)):=mean(a)) test_dtb[,eval(expr), by=id] 
like image 31
Wojciech Sobala Avatar answered Sep 29 '22 23:09

Wojciech Sobala