Parallel distance Matrix in R

Tags:

currently I'm using the build in function dist to calculate my distance matrix in R.

dist(featureVector,method="manhattan")

This is currently the bottlneck of the application and therefore the idea was to parallize this task(conceptually this should be possible)

Searching google and this forum did not succeed.

Does anybody has an idea?

376

asked Jun 16 '13 22:06

Vespasian

2 Answers

The R package amap provides robust and parallelized functions for Clustering and Principal Component Analysis. Among these functions, Dist method offers what you are looking for: computes and returns the distance matrix in a parallel manner.

Dist(x, method = "euclidean", nbproc = 8)

The code above compute euclidean distance with 8 threads.

171

answered Oct 13 '22 08:10

Zhilong Jia

Here's the structure for one route you could go. It is not faster than just using the dist() function, instead taking many times longer. It does process in parallel, but even if the computation time were reduced to zero, the time to start up the function and export the variables to the cluster would probably be longer than just using dist()

library(parallel)

vec.array <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)

TaxiDistFun <- function(one.vec, whole.matrix) {
    diff.matrix <- t(t(whole.matrix) - one.vec)
    this.row <- apply(diff.matrix, 1, function(x) sum(abs(x)))
    return(this.row)
}

cl <- makeCluster(detectCores())
clusterExport(cl, list("vec.array", "TaxiDistFun"))

system.time(dist.array <- parRapply(cl, vec.array,
                        function(x) TaxiDistFun(x, vec.array)))

stopCluster(cl)

dim(dist.array) <- c(2000, 2000)

answered Oct 13 '22 08:10

Will Beason

Related questions
                            
                                How do you cast a double to an integer in R?
                            
                                R suppress names when displaying or printing a named vector
                            
                                Is there a way to run an expression on.exit() but only if completes normally, not on error?
                            
                                Make a user-created function in R
                            
                                Rolling joins: roll forwards and backwards
                            
                                Combining elements of list of lists by index
                            
                                Custom levels in ggplot2 contour plot?
                            
                                Why do ncol and nrow only yield NULL when I do have data?
                            
                                Optimal/efficient plotting of survival/regression analysis results
                            
                                How to efficiently calculate distance between pair of coordinates using data.table :=
                            
                                How do I convert Rd files to pdf for a package that I am creating in R?
                            
                                Convert igraph object to a data frame in R
                            
                                How do I store "arrays" of statistical models?
                            
                                Finding the indexes of multiple/overlapping matching substrings
                            
                                Access list element using get()
                            
                                How do I overlay an image on to a ggplot?
                            
                                How to create a stacked line plot
                            
                                Extracting data used to make a smooth plot in mgcv
                            
                                How to name the unnamed first column of a data.frame
                            
                                Fisher test error : LDSTP is too small

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallel distance Matrix in R

Tags:

r

parallel-processing

matrix

distance

spatial

Vespasian

People also ask

2 Answers

Zhilong Jia

Will Beason

Recent Activity

Donate For Us