Is there a performance advantage to multiple data.table assignments in one statement?

Question

In data.table, the following have equivalent results:

dt1 <- data.table(iris)
dt1[, Long.Petal := Petal.Length > mean(Petal.Length)]
dt1[, Wide.Petal := Petal.Width > mean(Petal.Width)]

and

dt2 <- data.table(iris)
dt2[, `:=`(
  Long.Petal = Petal.Length > mean(Petal.Length),
  Wide.Petal = Petal.Width > mean(Petal.Width)
)]

When working with a large data set, is there a performance advantage (in terms of memory or running time or both) to the latter form? Or is the overhead minimal, and it's just a matter of style and readability?

Arun · Accepted Answer

Things to take into account are a) the call to [.data.table, and b) running the code in [.data.table.

For a couple of calls, it shouldn't really affect. But if you're doing this 100's or 1000's of times (e.g., using a for-loop), then it could be less performant.. mostly due to the time for dispatching [.data.table. In that case, as long as there's no grouping, set() is a much better option.

In any case, these things are quite easy to benchmark for yourself on your dataset. Calling Rprof(); <your_code>; Rprof(NULL); summaryRprof() should give an idea of the time taken and where most of it is being spent.

Is there a performance advantage to multiple data.table assignments in one statement?

Tags:

performance

variable-assignment

r

data.table

shadowtalker

1 Answers

Arun

Recent Activity

Donate For Us

Is there a performance advantage to multiple data.table assignments in one statement?

Tags:

performance

variable-assignment

r

data.table

shadowtalker

1 Answers

Arun

Related questions

Recent Activity

Donate For Us