Thanks for implementing shift in dt1.9.6 first.
When I have many different groups, shift()
is against expectations slower than my old code:
library(data.table)
library(microbenchmark)
set.seed(1)
mg <- data.table(expand.grid(year = 2012:2016, id = 1:1000),
value = rnorm(5000))
microbenchmark(dt194 = mg[, l1 := c(value[-1], NA), by = .(id)],
dt196 = mg[, l2 := shift(value, n = 1,
type = "lead"), by = .(id)])
## Unit: milliseconds
## expr min lq mean median uq max eval
## dt194 4.93735 5.236034 5.718654 5.623736 5.74395 9.555922 100
## dt196 83.92612 87.530404 91.700317 90.953947 91.43783 257.473242 100
A detailed script is here: https://github.com/nachti/datatable_test/blob/master/leadtest.R
Did I misapply shift()
?
Edit: Avoiding :=
doesn't help (@MichaelChirico):
microbenchmark(dt194 = mg[, c(value[-1], NA), by = id],
dt196 = mg[, shift(value, n = 1,
type = "lead"), by = id])
## Unit: milliseconds
## expr min lq mean median uq max neval
## dt194 5.161973 5.429927 5.78047 5.698263 5.798132 10.42217 100
## dt196 79.526981 87.914502 92.18144 91.240949 91.896799 266.04031 100
Apart from this using :=
is part of the task ...
It offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix. Since a data. table is a data.
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
In data.table
version 1.14.3
this has been resolved and shift
becomes faster than ever.
library(data.table)
library(microbenchmark)
set.seed(1)
mg = data.table(expand.grid(year=2012:2016, id=1:1000),
value=rnorm(5000))
microbenchmark(v1.9.4 = mg[, c(value[-1], NA), by=id],
v1.9.6 = mg[, shift_no_opt(value, n=1, type="lead"), by=id],
v1.14.3 = mg[, shift(value, n=1, type="lead"), by=id],
unit="ms")
# Unit: milliseconds
# expr min lq mean median uq max neval
# v1.9.4 3.6600 3.8250 4.4930 4.1720 4.9490 11.700 100
# v1.9.6 18.5400 19.1800 21.5100 20.6900 23.4200 29.040 100
# v1.14.3 0.4826 0.5586 0.6586 0.6329 0.7348 1.318 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With