Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using :=, why is with=TRUE the default?

Tags:

r

data.table

In data.table, by default with = TRUE and j is evaluated within the frame of x. It helps then to use the column names as variables. And when with = FALSE, j is a vector of names or positions to select.

I managed to find some examples of with = FALSE.

set.seed(1234)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))

## The askers's solution
#first step is to create cumsum columns
colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames)
DT[, newColNames := lapply(.SD,cumsum) ,by=y, .SDcols = colNames, with=FALSE];
test <- DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=TRUE];

We can check that DT is:

> DT                       # setting `with=FALSE` - what I require
    x y  v SUM.x SUM.v
 1: 1 A 12     1    12
 2: 1 B 62     1    62
 3: 1 A 60     2    72
 4: 1 B 61     2   123
 5: 2 A 83     4   155
 6: 2 B 97     4   220
 7: 2 A  1     6   156
 8: 2 B 22     6   242
 9: 3 A 99     9   255
10: 3 B 47     9   289
11: 3 A 63    12   318
12: 3 B 49    12   338

and test is:

> test                     # this is when setting " with = TRUE"
    x y  v newColNames
 1: 1 A 12           1
 2: 1 B 62           1
 3: 1 A 60           2
 4: 1 B 61           2
 5: 2 A 83           4
 6: 2 B 97           4
 7: 2 A  1           6
 8: 2 B 22           6
 9: 3 A 99           9
10: 3 B 47           9
11: 3 A 63          12
12: 3 B 49          12

I don't understand why the result is this when setting with = TRUE. So my question basically is when is with = TRUE useful?

I don't get the point why default setting is with = TRUE, though there must be a good reason for it.

Many thanks!

like image 721
Bigchao Avatar asked Jan 20 '14 16:01

Bigchao


1 Answers

I see your point. We've moved away from using with=TRUE|FALSE in combination with :=. Since it isn't implicitly clear whether with=TRUE refers to the left had side or the right hand side of :=. Instead, wrapping the LHS of := with brackets is now preferred.

DT[, x.sum:=cumsum(x)]     # assign cumsum(x) to the column called "x.sum"
DT[, (target):=cumsum(x)]  # assign to the name contained in target's value 

As Justin alluded to, most of the time we assign to a new or existing column that we know up front. In other words, most commonly, the column being assigned to isn't held in a variable. We do that a lot so that needs to be convenient. That said, data.table is flexible and allows you to define the target column name programatically, too.

I suppose a case could be made that it should be :

DT[, "x.sum":=cumsum(x)]   # assign cumsum(x) to the column called "x.sum"
DT[, x.sum:=cumsum(x)]     # assign to the name contained in x.sum's contents.

However, since := is an assignment operator and j is evaluated within the scope of DT, to me, it would be confusing if DT[, x.sum:=cumsum(x)] didn't assign to the x.sum column.

Explicit brackets, i.e. (target):=, implies some sort of evaluation, so that syntax is clearer. In my mind anyway. Of course, you can call paste0 etc directly in the left hand side of := too without needing with=FALSE; e.g.,

DT[, paste0("SUM.",colNames) := lapply(.SD, ...), by=...]

In short, I never use with when I'm using :=.

like image 93
Matt Dowle Avatar answered Oct 20 '22 12:10

Matt Dowle