I have a bunch of data.tables in a list. I want to apply unique()
to each data.table in my list, but doing so destroys all my data.table keys.
Here's an example:
A <- data.table(a = rep(c("a","b"), each = 3), b = runif(6), key = "a")
B <- data.table(x = runif(6), b = runif(6), key = "x")
blah <- unique(A)
Here, blah
still has a key, and everything is right in the world:
key(blah)
# [1] "a"
But if I add the data.tables to a list and use lapply()
, the keys get destroyed:
dt.list <- list(A, B)
unique.list <- lapply(dt.list, unique) # Keys destroyed here
lapply(unique.list, key)
# [[1]]
# NULL
# [[2]]
# NULL
This probably has to do with me not really understanding what it means for keys to be assigned "by reference," as I've had other problems with keys disappearing.
So:
EDIT:
For what it's worth, the dreaded for
loop works just fine, too:
unique.list <- list()
for (i in 1:length(dt.list)) {
unique.list[[i]] <- unique(dt.list[[i]])
}
lapply(unique.list, key)
# [[1]]
# [1] "a"
# [[2]]
# [1] "x"
But this is R, and for
loops are evil.
Interestingly, notice the difference between these two different results
lapply(dt.list, unique)
lapply(dt.list, function(x) unique(x))
If you use the latter, the results are as you would expect.
The seemingly unexpected behavior is due to the fact that the first lapply
statement is
invoking unique.data.frame
(ie from {base}
) while the second is invoking unique.data.table
Good question. It turns out that it's documented in ?lapply
(see Note section) :
For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g. bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[0L]], ...), with 0L replaced by the current integer index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With