How do you refer to variables in a data.table
if the variable names are stored in a character vector? For instance, this works for a data.frame
:
df <- data.frame(col1 = 1:3) colname <- "col1" df[colname] <- 4:6 df # col1 # 1 4 # 2 5 # 3 6
How can I perform this same operation for a data.table, either with or without :=
notation? The obvious thing of dt[ , list(colname)]
doesn't work (nor did I expect it to).
setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.
A column can be added to an existing data table using := operator. Here ':' represents the fixed values and '=' represents the assignment of values.
To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.
Two ways to programmatically select variable(s):
with = FALSE
:
DT = data.table(col1 = 1:3) colname = "col1" DT[, colname, with = FALSE] # col1 # 1: 1 # 2: 2 # 3: 3
'dot dot' (..
) prefix:
DT[, ..colname] # col1 # 1: 1 # 2: 2 # 3: 3
For further description of the 'dot dot' (..
) notation, see New Features in 1.10.2 (it is currently not described in help text).
To assign to variable(s), wrap the LHS of :=
in parentheses:
DT[, (colname) := 4:6] # col1 # 1: 4 # 2: 5 # 3: 6
The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i
was present, it would subassign by reference. The parens around (colname)
is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:
Using
with = FALSE
with:=
is now deprecated in all cases, given that wrapping the LHS of:=
with parentheses has been preferred for some time.
colVar = "col1"
DT[, (colVar) := 1] # please change to this DT[, c("col1", "col2") := 1] # no change DT[, 2:4 := 1] # no change DT[, c("col1","col2") := list(sum(a), mean(b))] # no change DT[, `:=`(...), by = ...] # no change
See also Details section in ?`:=`
:
DT[i, (colnamevector) := value] # [...] The parens are enough to stop the LHS being a symbol
And to answer further question in comment, here's one way (as usual there are many ways) :
DT[, colname := cumsum(get(colname)), with = FALSE] # col1 # 1: 4 # 2: 9 # 3: 15
or, you might find it easier to read, write and debug just to eval
a paste
, similar to constructing a dynamic SQL statement to send to a server :
expr = paste0("DT[,",colname,":=cumsum(",colname,")]") expr # [1] "DT[,col1:=cumsum(col1)]" eval(parse(text=expr)) # col1 # 1: 4 # 2: 13 # 3: 28
If you do that a lot, you can define a helper function EVAL
:
EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2)) EVAL("DT[,",colname,":=cumsum(",colname,")]") # col1 # 1: 4 # 2: 17 # 3: 45
Now that data.table
1.8.2 automatically optimizes j
for efficiency, it may be preferable to use the eval
method. The get()
in j
prevents some optimizations, for example.
Or, there is set()
. A low overhead, functional form of :=
, which would be fine here. See ?set
.
set(DT, j = colname, value = cumsum(DT[[colname]])) DT # col1 # 1: 4 # 2: 21 # 3: 66
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With