Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

challenge: optimize unlisting [easy]

Because SO is a bit slow lately, I'm posting an easy question. I would appreciate it if big fishes stayed on the bench for this one and give rookies a chance to respond.

Sometimes we have objects that have a ridiculous amount of large list elements (vectors). How would you "unlist" this object into a single vector. Show proof that your method is faster than unlist().

like image 442
Roman Luštrik Avatar asked Oct 25 '10 08:10

Roman Luštrik


3 Answers

If you don't need names and your list is one level deep, then if you can beat

.Internal(unlist(your_list, FALSE, FALSE))

I will vote up everything you do on SO for the next 1 year!!!

[Update: if non-unique names are needed and the list is not recursive, here is a version which improves over the unlist 100 times

 myunlist <- function(l){
    names <- names(l)
    vec <- unlist(l, F, F)
    reps <- unlist(lapply(l, length), F, F)
    names(vec) <- rep(names, reps)
    vec
    }

 myunlist(list(a=1:3, b=2))
 a a a b 
 1 2 3 2 

 > tl <- list(a = 1:20000, b = 1:5000, c = 2:30)
 > system.time(for(i in 1:200) unlist(tl))
 user  system elapsed 
 22.97    0.00   23.00 

 > system.time(for(i in 1:200) myunlist(tl))
 user  system elapsed 
 0.2     0.0     0.2 

 > system.time(for(i in 1:200) unlist(tl, F, F))
 user  system elapsed 
 0.02    0.00    0.02 

]

[Update2: Responce to challenge Nr3 from Richie Cotton.

bigList3 <- replicate(500, rnorm(1e3), simplify = F)

unlist_vit <- function(l){
    names(l) <- NULL
    do.call(c, l)
    }

library(rbenchmark)

benchmark(unlist = unlist(bigList3, FALSE, FALSE),
          rjc    = unlist_rjc(bigList3),
          vit    = unlist_vit(bigList3),
          order  = "elapsed",
          replications = 100,
          columns = c("test", "relative", "elapsed")
          )

    test  relative elapsed
1 unlist   1.0000    2.06
3    vit   1.4369    2.96
2    rjc   3.5146    7.24

]

PS: I assume a "big fish" is the one with more reputation than you. So I am pretty much small here :).

like image 119
VitoshKa Avatar answered Oct 15 '22 11:10

VitoshKa


A non-unlist() solution would have to be pretty darned fast to beat unlist() would it not? Here it takes less than two second to unlist a list with 2000 numeric vectors of length 100,000 each.

> bigList2 <- as.list(data.frame(matrix(rep(rnorm(1000000), times = 200), 
+                                       ncol = 2000)))
> print(object.size(bigList2), units = "Gb")
1.5 Gb
> system.time(foo <- unlist(bigList2, use.names = FALSE))
   user  system elapsed 
  1.897   0.000   2.019

With bigList2 and foo in my workspace, R is using ~9Gb of my available memory. The key is use.names = FALSE. Without it unlist() is painfully slow. Exactly how slow I'm still waiting to find out...

We can speed this up a little bit more by setting recursive = FALSE and then we have effectively the same as VitoshKa's answer (two representative timings):

> system.time(foo <- unlist(bigList2, recursive = FALSE, use.names = FALSE))
   user  system elapsed 
  1.379   0.001   1.416
> system.time(foo <- .Internal(unlist(bigList2, FALSE, FALSE)))
   user  system elapsed 
  1.335   0.000   1.344

... finally the use.names = TRUE version finished...:

> system.time(foo <- unlist(bigList2, use = TRUE))
    user   system  elapsed 
2307.839   10.978 2335.815

and it was consuming all my systems 16Gb of RAM so I gave up at that point...

like image 42
Gavin Simpson Avatar answered Oct 15 '22 13:10

Gavin Simpson


c() has the logical argument recursive which will recursively unlist a vector when set to TRUE (default is obviously FALSE).

l <- replicate(500, rnorm(1e3), simplify = F)

microbenchmark::microbenchmark(
  unlist = unlist(l, FALSE, FALSE),
  c = c(l, recursive = TRUE, use.names = FALSE)
)

# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval
# unlist 3.083424 3.121067 4.662491 3.172401 3.985668 27.35040   100
#      c 3.084890 3.133779 4.090520 3.201246 3.920646 33.22832   100
like image 38
seasmith Avatar answered Oct 15 '22 11:10

seasmith