Because SO is a bit slow lately, I'm posting an easy question. I would appreciate it if big fishes stayed on the bench for this one and give rookies a chance to respond.
Sometimes we have objects that have a ridiculous amount of large list elements (vectors). How would you "unlist" this object into a single vector. Show proof that your method is faster than unlist()
.
If you don't need names and your list is one level deep, then if you can beat
.Internal(unlist(your_list, FALSE, FALSE))
I will vote up everything you do on SO for the next 1 year!!!
[Update: if non-unique names are needed and the list is not recursive, here is a version which improves over the unlist 100 times
myunlist <- function(l){
names <- names(l)
vec <- unlist(l, F, F)
reps <- unlist(lapply(l, length), F, F)
names(vec) <- rep(names, reps)
vec
}
myunlist(list(a=1:3, b=2))
a a a b
1 2 3 2
> tl <- list(a = 1:20000, b = 1:5000, c = 2:30)
> system.time(for(i in 1:200) unlist(tl))
user system elapsed
22.97 0.00 23.00
> system.time(for(i in 1:200) myunlist(tl))
user system elapsed
0.2 0.0 0.2
> system.time(for(i in 1:200) unlist(tl, F, F))
user system elapsed
0.02 0.00 0.02
]
[Update2: Responce to challenge Nr3 from Richie Cotton.
bigList3 <- replicate(500, rnorm(1e3), simplify = F)
unlist_vit <- function(l){
names(l) <- NULL
do.call(c, l)
}
library(rbenchmark)
benchmark(unlist = unlist(bigList3, FALSE, FALSE),
rjc = unlist_rjc(bigList3),
vit = unlist_vit(bigList3),
order = "elapsed",
replications = 100,
columns = c("test", "relative", "elapsed")
)
test relative elapsed
1 unlist 1.0000 2.06
3 vit 1.4369 2.96
2 rjc 3.5146 7.24
]
PS: I assume a "big fish" is the one with more reputation than you. So I am pretty much small here :).
A non-unlist()
solution would have to be pretty darned fast to beat unlist()
would it not? Here it takes less than two second to unlist a list with 2000 numeric vectors of length 100,000 each.
> bigList2 <- as.list(data.frame(matrix(rep(rnorm(1000000), times = 200),
+ ncol = 2000)))
> print(object.size(bigList2), units = "Gb")
1.5 Gb
> system.time(foo <- unlist(bigList2, use.names = FALSE))
user system elapsed
1.897 0.000 2.019
With bigList2
and foo
in my workspace, R is using ~9Gb of my available memory. The key is use.names = FALSE
. Without it unlist()
is painfully slow. Exactly how slow I'm still waiting to find out...
We can speed this up a little bit more by setting recursive = FALSE
and then we have effectively the same as VitoshKa's answer (two representative timings):
> system.time(foo <- unlist(bigList2, recursive = FALSE, use.names = FALSE))
user system elapsed
1.379 0.001 1.416
> system.time(foo <- .Internal(unlist(bigList2, FALSE, FALSE)))
user system elapsed
1.335 0.000 1.344
... finally the use.names = TRUE
version finished...:
> system.time(foo <- unlist(bigList2, use = TRUE))
user system elapsed
2307.839 10.978 2335.815
and it was consuming all my systems 16Gb of RAM so I gave up at that point...
c()
has the logical argument recursive
which will recursively unlist a vector when set to TRUE
(default is obviously FALSE
).
l <- replicate(500, rnorm(1e3), simplify = F)
microbenchmark::microbenchmark(
unlist = unlist(l, FALSE, FALSE),
c = c(l, recursive = TRUE, use.names = FALSE)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# unlist 3.083424 3.121067 4.662491 3.172401 3.985668 27.35040 100
# c 3.084890 3.133779 4.090520 3.201246 3.920646 33.22832 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With