This question came today in the manipulatr mailing list. <pre class="prettyprint"><code>http://groups.google.com/group/manipulatr/browse_thread/thread/fbab76945f7cba3f </code></pre> I am rephrasing. Given a distance matrix (calculated with <code>dist</code>) apply a function to the rows of the distance matrix. Code: <pre class="prettyprint"><code>library(plyr) N <- 100 a <- data.frame(b=1:N,c=runif(N)) d <- dist(a,diag=T,upper=T) sumd <- adply(as.matrix(d),1,sum) </code></pre> The problem is that to apply the function by row you have to store the whole matrix (instead of just the lower triangular part. So it uses too much memory for large matrices. It fails in my computer for matrices of dimensions ~ 10000. Any ideas?

First of all, for anyone who hasn't seen this yet, I strongly recommend reading this article on the r-wiki about code optimization. Here's another version without using <code>ifelse</code> (that's a relatively slow function): <pre class="prettyprint"><code>noeq.2 <- function(i, j, N) { i <- i-1 j <- j-1 x <- i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i x2 <- j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j idx <- i < j x[!idx] <- x2[!idx] x[i==j] <- 0 x } </code></pre> And timings on my laptop: <pre class="prettyprint"><code>> N <- 1000 > system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N)))) user system elapsed 51.31 0.10 52.06 > system.time(sapply(1:N, function(j) noeq.1(1:N, j, N))) user system elapsed 2.47 0.02 2.67 > system.time(sapply(1:N, function(j) noeq.2(1:N, j, N))) user system elapsed 0.88 0.01 1.12 </code></pre> And lapply is faster than sapply: <pre class="prettyprint"><code>> system.time(do.call("rbind",lapply(1:N, function(j) noeq.2(1:N, j, N)))) user system elapsed 0.67 0.00 0.67 </code></pre>

Applying a function to a distance matrix in R

Tags:

algorithm

r

This question came today in the manipulatr mailing list.

http://groups.google.com/group/manipulatr/browse_thread/thread/fbab76945f7cba3f

I am rephrasing.

Given a distance matrix (calculated with dist) apply a function to the rows of the distance matrix.

Code:

library(plyr)
N <- 100
a <- data.frame(b=1:N,c=runif(N))
d <- dist(a,diag=T,upper=T)
sumd <- adply(as.matrix(d),1,sum)

The problem is that to apply the function by row you have to store the whole matrix (instead of just the lower triangular part. So it uses too much memory for large matrices. It fails in my computer for matrices of dimensions ~ 10000.

Any ideas?

944

asked Nov 07 '09 07:11

Eduardo Leoni

1 Answers

First of all, for anyone who hasn't seen this yet, I strongly recommend reading this article on the r-wiki about code optimization.

Here's another version without using ifelse (that's a relatively slow function):

noeq.2 <- function(i, j, N) {
    i <- i-1
    j <- j-1
    x <- i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i
    x2 <- j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j
    idx <- i < j
    x[!idx] <- x2[!idx]
    x[i==j] <- 0
    x
}

And timings on my laptop:

> N <- 1000
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
   user  system elapsed 
  51.31    0.10   52.06 
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
   user  system elapsed 
   2.47    0.02    2.67 
> system.time(sapply(1:N, function(j) noeq.2(1:N, j, N)))
   user  system elapsed 
   0.88    0.01    1.12

And lapply is faster than sapply:

> system.time(do.call("rbind",lapply(1:N, function(j) noeq.2(1:N, j, N))))
   user  system elapsed 
   0.67    0.00    0.67

105

answered Sep 22 '22 01:09

Shane

Related questions
                            
                                Exactly storing large integers
                            
                                How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?
                            
                                Find minimum number of iterations to reach a certain sum
                            
                                Time complexity of the Ford-Fulkerson method in a flow network with unit capacity edges
                            
                                How to implement dfs using recursion?
                            
                                Best case time complexity for selection sort
                            
                                Split a binary search Tree
                            
                                Counting minimum number of swaps to group characters in string
                            
                                What's the best way to enumerate permutations of deck of cards?
                            
                                Infomap community detection understanding
                            
                                How to find a columns set for a primary key candidate in CSV file?
                            
                                Generate a list a(n) is not of the form prime + a(k), k < n
                            
                                Algorithmic question: Best angle to view trees from fixed camera [closed]
                            
                                How to compute intersection of N sorted sets?
                            
                                What is the algorithm to determine the best way to distribute these coupons?
                            
                                What is the best way to sort a partially ordered list?
                            
                                c# GDI Edge Whitespace Detection Algorithm
                            
                                What's a good data structure for building equivalence classes on nodes of a tree?
                            
                                Occlusion algorithms collection
                            
                                How do I select all elements in a list that are out-of-order?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With