This question came today in the manipulatr mailing list.
http://groups.google.com/group/manipulatr/browse_thread/thread/fbab76945f7cba3f
I am rephrasing.
Given a distance matrix (calculated with dist
) apply a function to the rows of the distance matrix.
Code:
library(plyr)
N <- 100
a <- data.frame(b=1:N,c=runif(N))
d <- dist(a,diag=T,upper=T)
sumd <- adply(as.matrix(d),1,sum)
The problem is that to apply the function by row you have to store the whole matrix (instead of just the lower triangular part. So it uses too much memory for large matrices. It fails in my computer for matrices of dimensions ~ 10000.
Any ideas?
The dist() function in R can be used to calculate a distance matrix, which displays the distances between the rows of a matrix or data frame. where: x: The name of the matrix or data frame. method: The distance measure to use.
For computing distance matrix by GPU in R programming, we can use the dist() function. dist() function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.
dist() function To do this in R, we use the dist function to calculate the euclidean distance between our observations. The function simply requires a data frame or matrix containing your observations and features.
Example 1: Compute Euclidean Distance Using Default Specifications of dist() Function. Have a look at the output of the RStudio console. It shows the distances of each combination of our data rows. Note that the dist function computes the Euclidean Distance by default.
First of all, for anyone who hasn't seen this yet, I strongly recommend reading this article on the r-wiki about code optimization.
Here's another version without using ifelse
(that's a relatively slow function):
noeq.2 <- function(i, j, N) {
i <- i-1
j <- j-1
x <- i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i
x2 <- j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j
idx <- i < j
x[!idx] <- x2[!idx]
x[i==j] <- 0
x
}
And timings on my laptop:
> N <- 1000
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
user system elapsed
51.31 0.10 52.06
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
user system elapsed
2.47 0.02 2.67
> system.time(sapply(1:N, function(j) noeq.2(1:N, j, N)))
user system elapsed
0.88 0.01 1.12
And lapply is faster than sapply:
> system.time(do.call("rbind",lapply(1:N, function(j) noeq.2(1:N, j, N))))
user system elapsed
0.67 0.00 0.67
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With