Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why has data.table defined := rather than overloading <-?

data.table has introduced the := operator. Why not overload <-?

like image 546
Matt Dowle Avatar asked Aug 11 '11 21:08

Matt Dowle


People also ask

What is := in data table?

Modify / Add / Delete columns To modify an existing column, or create a new one, use the := operator. Using the data. table := operator modifies the existing object 'in place', which has the benefit of being memory-efficient. Memory management is an important aspect of data.

Why is data table so fast?

There are a number of reasons why data. table is fast, but a key one is that unlike many other tools, it allows you to modify things in your table by reference, so it is changed in-situ rather than requiring the object to be recreated with your modifications.

How do you read a data table?

A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.

How do I order a data table in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.


1 Answers

There are two places that <- could be 'overloaded' :

x[i, j] <- value           # 1 x[i, {colname <- value}]   # 2 

The first one copies the whole of x to *tmp*, changes that working copy, and assigns back to x. That's an R thing (src/main/eval.c and subassign.c) discussed recently on r-devel here. It sounded like it might be possible to change R to allow packages, or R itself, to avoid that copy to *tmp*, but isn't currently possible, IIUC.

The second one is what Owen's answer refers to, I think. If you accept that it's ok to do assignment by reference within j like that, then which operator? As per the comment to Owen's answer, <- and <<- are already used by users in j, so we hit upon :=.

Even if [<- didn't copy the whole of x, we still like := in j so we can do things like this :

DT[,{newcol1:=sum(a)      newcol2:=a/newcol1}, by=group] 

Where the new columns are added by reference to the table, and the RHS of each := is evaluated within each group. (When := within group is implemented.)


Update Oct 2012

As of 1.8.2 (on CRAN in Jul 2012), := by group was implemented for adding or updating single columns; i.e., single LHS of :=. And now in v1.8.3 (on R-Forge at the time of writing), multiple columns can be added by group; e.g.,

DT[, c("newcol1","newcol2") := .(sum(a),sum(b)), by=group] 

or, perhaps more elegantly :

DT[,`:=`(newcol1=sum(a),          newcol2=sum(b)), by=group] 

But the iterative multiple RHS, envisaged for a while, where the 2nd expression could use the result from the first, isn't implemented yet (FR#1492). So this will still give an error "newcol1 not found" and need to be done in two steps :

DT[,`:=`(newcol1=sum(a),          newcol2=a/newcol1), by=group] 
like image 186
Matt Dowle Avatar answered Oct 01 '22 09:10

Matt Dowle