Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dist function with large number of points

Tags:

r

I am using the dist {stats} function to calculate the distance between points, my problem is that I have 24469 points, and the output for the dist function gives me a vector with 18705786 length, instead of the matrix. I tried already to export as.matrix, but the file is 2 large.

How can I have access to what points corresponds each distance?

For example which(distance<=700) gives me the position in the vector, but how can I get the info to what points this distance corresponds to?

like image 498
Gago-Silva Avatar asked Apr 24 '13 10:04

Gago-Silva


1 Answers

There are asome things you could try, also depending on what you need exactly:

  • Calculate the distances in a loop, and only keep those that match the criterium. Especially when the number of matches is much smaller than the total size of the distance matrix, this saves a lot of RAM usage. This loop is probably very slow if it is implemented in pure R, that is alos why dist does not use R but I believe C to perform the calculations. This could mean that you get your results, but have to wait a while. Alternatively, the excellent Rcpp package would allow you to write this down in C/C++, making it much much faster probably.
  • Start using packages like bigmemory in storing the distance matrix. You then build it in a loop and store it iteratively in the bigmemory object (I have not worked with bigmemory before, so I don't know the exact details). Then after building the matrix, you can access it to extract your desired results. Effectively, all tricks to handle large data in R apply to this bullet. See e.g. R SO posts on big data.

Some interesting links (found googling for r distance matrix for large vector):

  • Efficient (memory-wise) function for repeated distance matrix calculations AND chunking of extra large distance matrices
  • (lucky you!) http://stevemosher.wordpress.com/2012/04/08/using-bigmemory-for-a-distance-matrix/
like image 83
Paul Hiemstra Avatar answered Sep 27 '22 20:09

Paul Hiemstra