Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is := faster than `:=`()?

Tags:

r

data.table

Usually, I use the functional form `:=`() to compute multiple columns in a data.table, thinking that this is the most efficient method. But I've recently discovered that it's slower than simply repeatedly using :=. At least on my computer.

I'm guessing that there might be some overhead with the functional form of := but is that the entire reason why it's slower? I'm simply asking out of curiosity in order to understand the internals of data.table better.

library(data.table)


n <- 5000000
dt <- data.table(a = rnorm(n),
                 b = rnorm(n),
                 c = rnorm(n))

dt_a <- copy(dt)

system.time({
  dt_a[, d := a + b]
  dt_a[, e := b + c]
  dt_a[, f := a + c]
})
#>    user  system elapsed 
#>   0.076   0.060   0.136

dt_b <- copy(dt)

system.time({
  dt_b[, `:=`(d = a + b,
              e = b + c,
              f = a + c)]
})
#>    user  system elapsed 
#>   0.096   0.116   0.211

Update:

One interesting property of this is that the time difference between := and `:=`() is relative at about a factor of 1.5 to 2. If this was simply due to function overhead, as some suggest, I would suspect the time difference to be a fixed value?

library(data.table)


n <- 20000000
dt <- data.table(a = rnorm(n),
                 b = rnorm(n),
                 c = rnorm(n))

dt_a <- copy(dt)

system.time({
  dt_a[, d := a + b]
  dt_a[, e := b + c]
  dt_a[, f := a + c]
})
#>    user  system elapsed 
#>   0.163   0.208   0.371

dt_b <- copy(dt)

system.time({
  dt_b[, `:=`(d = a + b,
              e = b + c,
              f = a + c)]
})
#>    user  system elapsed 
#>   0.284   0.404   0.688
like image 446
petrovski Avatar asked Jun 19 '19 11:06

petrovski


People also ask

Why === is faster than ==?

Equality operator == converts the data type temporarily to see if its value is equal to the other operand, whereas === (the identity operator) doesn't need to do any type casting and thus less work is done, which makes it faster than ==.

Why set is faster than list in Java?

Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables.

Why are sets faster than arrays?

Why sets are faster than arrays. Most of the operations on javascript arrays, like insert, delete, search, etc are linear-time operations. They need O(n) time to complete where n is the size of the array. But since sets use keys to store elements, most of the operations take constant time O(1).

Is there a faster language than C?

Judging the performance of programming languages, usually C is called the leader, though Fortran is often faster. New programming languages commonly use C as their reference and they are really proud to be only so much slower than C.


1 Answers

Some timings:

bench::mark(
    chaining = DT0[, d := a + b][, e := b + c][, f := a + c],
    assign = DT1[, c("d", "e", "f") := .(a+b, b+c, a+c)],
    assign2 = DT1.1[, `:=` (d, a + b)][, `:=` (e, b + c)][, `:=` (f, a + c)],
    use_set = {
        set(DT2, NULL, "d", DT2[["a"]]+DT2[["b"]])
        set(DT2, NULL, "e", DT2[["b"]]+DT2[["c"]])
        set(DT2, NULL, "f", DT2[["a"]]+DT2[["c"]])
    },
    functional = DT3[, `:=`(d = a + b, e = b + c, f = a + c)]
)

timings and memory usage:

  expression     min    mean  median     max `itr/sec` mem_alloc  n_gc n_itr total_time result           memory      time   gc       
  <chr>      <bch:t> <bch:t> <bch:t> <bch:t>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list>           <list>      <list> <list>   
1 chaining     180ms   180ms   180ms   180ms      5.54     458MB     1     1      180ms <data.table [20~ <Rprofmem ~ <bch:~ <tibble ~
2 assign       320ms   320ms   320ms   320ms      3.12     916MB     1     1      320ms <data.table [20~ <Rprofmem ~ <bch:~ <tibble ~
3 assign2      188ms   188ms   188ms   188ms      5.33     458MB     1     1      188ms <data.table [20~ <Rprofmem ~ <bch:~ <tibble ~
4 use_set      322ms   323ms   323ms   323ms      3.10     916MB     0     2      645ms <data.table [20~ <Rprofmem ~ <bch:~ <tibble ~
5 functional   331ms   331ms   331ms   331ms      3.02     916MB     1     1      331ms <data.table [20~ <Rprofmem ~ <bch:~ <tibble ~

data:

library(data.table) #data.table_1.12.2  
set.seed(0L)
n <- 2e7
DT <- data.table(a=rnorm(n), b=rnorm(n), c=rnorm(n))
DT0 <- copy(DT)
DT1 <- copy(DT)
DT1.1 <- copy(DT)
DT2 <- copy(DT)
DT3 <- copy(DT)
like image 97
2 revs Avatar answered Oct 01 '22 01:10

2 revs