Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use outer instead of expand.grid

Tags:

I'm looking for as much speed as possible and staying in base to do what expand.grid does. I have used outer for similar purposes in the past to create a vector; something like this:

v <- outer(letters, LETTERS, paste0) unlist(v[lower.tri(v)]) 

Benchmarking has shown me that outer can be drastically faster than expand.grid but this time I want to create two columns just like expand.grid (all possible combos for 2 vectors) but my methods with outer do not benchmark as fast with outer this time.

I'm hoping to take 2 vectors and create every possible combo as two columns as fast as possible (I think outer may be the route but am wide open to any base method.

Here's the expand.grid method and outer method.

dat <- cbind(mtcars, mtcars, mtcars)  expand.grid(seq_len(nrow(dat)), seq_len(ncol(dat)))  FOO <- function(x, y) paste(x, y, sep=":") x <- outer(seq_len(nrow(dat)), seq_len(ncol(dat)), FOO) apply(do.call("rbind", strsplit(x, ":")), 2, as.integer) 

The microbenchmarking shows outer is slower:

#     expr      min        lq    median        uq      max # EXPAND.G  812.743  838.6375  894.6245  927.7505 27029.54 #    OUTER 5107.871 5198.3835 5329.4860 5605.2215 27559.08 

I think my outer use is slow because I don't know how to use outer to directly create a length 2 vector that I can do.call('rbind' together. I have to slow paste and slow split. How can I do this with outer (or other methods in base) in a way that's faster than expand grid?

EDIT: Adding the microbenchmark results.

**

Unit: microseconds       expr     min       lq  median      uq       max 1   ERNEST  34.993  39.1920  52.255  57.854 29170.705 2     JOHN  13.997  16.3300  19.130  23.329   266.872 3 ORIGINAL 352.720 372.7815 392.377 418.738 36519.952 4    TOMMY  16.330  19.5960  23.795  27.061  6217.374 5  VINCENT 377.447 400.3090 418.505 451.864 43567.334 

**

enter image description here

like image 548
Tyler Rinker Avatar asked May 01 '12 23:05

Tyler Rinker


1 Answers

The documentation for rep.int isn't quite complete. It isn't just fastest in the most common case because you can pass vectors for the times argument, just like with rep. You can use it straightforward for both sequences reducing the time another 40% or so over Tommy's.

expand.grid.jc <- function(seq1,seq2) {     cbind(Var1 = rep.int(seq1, length(seq2)),      Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2)))) } 
like image 88
John Avatar answered Sep 21 '22 17:09

John