I have encountered a somewhat unintuitive behavior of keys in data.table
package. Here goes an example:
library(data.table)
foo <- data.table(a = c(1:4), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)
Then, there is one alarming result of key()
:
key(foo[, .(mean(c + d)), by = .(b)]) # result is "b".
key(foo[, .(mean(c + d)), by = .(a)]) # result is "a". (!!)
Then, there is another example which produces diffirent, more reasonable results.
foo <- data.table(a = c(4:1), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)
key(foo[, .(mean(c + d)), by = .(b)]) # result is "b".
key(foo[, .(mean(c + d)), by = .(a)]) # result is NULL
I admit I'm confused. My lead is this key()
somehow checks whether the resulting table needed to be sorted by the elements in by
and then assumes it was keyed.
Is it a feature? Is it a bug?
data. table method. subset and with are base R functions that are useful for reducing repetition in code, enhancing readability, and reducing number the total characters the user has to type. This functionality is possible in R because of a quite unique feature called lazy evaluation.
data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.
To add row to R Data Frame, append the list or vector representing the row, to the end of the data frame. nrow(df) returns the number of rows in data frame.
Is it a feature? Is it a bug?
In the first example you get key="a"
because the result from that query happened to be ordered in a way that a
column was in non-decreasing order. Because of that we could call this behaviour a feature.
The problem is that creating a key silently might have not always been desired, thus this behaviour has been changed since you asked that question.
Now (as of 1.12.0) running code from first chunk removes the key and ignores the fact that results are ordered by a
.
library(data.table)
foo <- data.table(a = c(1:4), b = c(2:5), c = c(3:6), d = c(4:7))
setkey(foo, b)
key(foo[, .(mean(c + d)), by = .(b)])
#[1] "b"
key(foo[, .(mean(c + d)), by = .(a)])
#NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With