Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining chaining and assignment by reference in a data.table

Tags:

r

data.table

Is it possible to combine chaining and assignment by reference in a data.table?

For example, I would like to do this:

DT[a == 1][b == 0, c := 2]

However, this leaves the original table unchanged, as a temporary table seems to be created after DT[a == 1] which is subsequently changed and returned.

I would rather not do

DT[a == 1 & b == 0, c := 2]

as this is very slow and I would also rather avoid

 DT <- DT[a == 1][b == 0, c := 2]

as I would prefer to do the assignment by reference. This question is part of the question [1], where it is left unanswered.

[1] Conditional binary join and update by reference using the data.table package

like image 597
castle Avatar asked Apr 22 '15 22:04

castle


People also ask

What is := in data table?

Modify / Add / Delete columns To modify an existing column, or create a new one, use the := operator. Using the data. table := operator modifies the existing object 'in place', which has the benefit of being memory-efficient. Memory management is an important aspect of data.

How do I add a row to a data table in R?

To add row to R Data Frame, append the list or vector representing the row, to the end of the data frame. nrow(df) returns the number of rows in data frame.


1 Answers

I'm not sure why you think that even if DT[a == 1][b == 0, c := 2] worked in theory it would be more efficient than DT[a == 1 & b == 0, c := 2]

Either way, the most efficient solution in your case would be to key by both a and b and conduct the assignment by reference while performing a binary join on both

DT <- data.table(a = c(1, 1, 1, 2, 2), b = c(0, 2, 0, 1, 1)) ## mock data
setkey(DT, a, b) ## keying by both `a` and `b`
DT[J(1, 0), c := 2] ## Update `c` by reference
DT
#    a b  c
# 1: 1 0  2
# 2: 1 0  2
# 3: 1 2 NA
# 4: 2 1 NA
# 5: 2 1 NA
like image 81
David Arenburg Avatar answered Oct 16 '22 01:10

David Arenburg