In data.table
, by default with = TRUE
and j
is evaluated within the frame of x
. It helps then to use the column names as variables. And when with = FALSE
, j
is a vector of names or positions to select.
I managed to find some examples of with = FALSE
.
set.seed(1234)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
## The askers's solution
#first step is to create cumsum columns
colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames)
DT[, newColNames := lapply(.SD,cumsum) ,by=y, .SDcols = colNames, with=FALSE];
test <- DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=TRUE];
We can check that DT
is:
> DT # setting `with=FALSE` - what I require
x y v SUM.x SUM.v
1: 1 A 12 1 12
2: 1 B 62 1 62
3: 1 A 60 2 72
4: 1 B 61 2 123
5: 2 A 83 4 155
6: 2 B 97 4 220
7: 2 A 1 6 156
8: 2 B 22 6 242
9: 3 A 99 9 255
10: 3 B 47 9 289
11: 3 A 63 12 318
12: 3 B 49 12 338
and test
is:
> test # this is when setting " with = TRUE"
x y v newColNames
1: 1 A 12 1
2: 1 B 62 1
3: 1 A 60 2
4: 1 B 61 2
5: 2 A 83 4
6: 2 B 97 4
7: 2 A 1 6
8: 2 B 22 6
9: 3 A 99 9
10: 3 B 47 9
11: 3 A 63 12
12: 3 B 49 12
I don't understand why the result is this when setting with = TRUE
. So my question basically is when is with = TRUE
useful?
I don't get the point why default setting is with = TRUE
, though there must be a good reason for it.
Many thanks!
I see your point. We've moved away from using with=TRUE|FALSE
in combination with :=
. Since it isn't implicitly clear whether with=TRUE
refers to the left had side or the right hand side of :=
. Instead, wrapping the LHS of :=
with brackets is now preferred.
DT[, x.sum:=cumsum(x)] # assign cumsum(x) to the column called "x.sum"
DT[, (target):=cumsum(x)] # assign to the name contained in target's value
As Justin alluded to, most of the time we assign to a new or existing column that we know up front. In other words, most commonly, the column being assigned to isn't held in a variable. We do that a lot so that needs to be convenient. That said, data.table
is flexible and allows you to define the target column name programatically, too.
I suppose a case could be made that it should be :
DT[, "x.sum":=cumsum(x)] # assign cumsum(x) to the column called "x.sum"
DT[, x.sum:=cumsum(x)] # assign to the name contained in x.sum's contents.
However, since :=
is an assignment operator and j
is evaluated within the scope of DT
, to me, it would be confusing if DT[, x.sum:=cumsum(x)]
didn't assign to the x.sum
column.
Explicit brackets, i.e. (target):=
, implies some sort of evaluation, so that syntax is clearer. In my mind anyway. Of course, you can call paste0
etc directly in the left hand side of :=
too without needing with=FALSE
; e.g.,
DT[, paste0("SUM.",colNames) := lapply(.SD, ...), by=...]
In short, I never use with
when I'm using :=
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With