I have a collection of n coordinate points of the form (x,y,z). These are stored in an n x 3 matrix M.
Is there a built in function in Julia to calculate the distance between each point and every other point? I'm working with a small number of points so calculation time isn't too important.
My overall goal is to run a clustering algorithm, so if there is a clustering algorithm that I can look at that doesn't require me to first calculate these distances please suggest that too. An example of the data I would like to perform clustering on is below. Obviously I'd only need to do this for the z coordinate.
Euclidean distance is considered the traditional metric for problems with geometry. It can be simply explained as the ordinary distance between two points. It is one of the most used algorithms in the cluster analysis.
Calculate squared euclidean distance between all data points to the centroids AB, CD. For example distance between A(2,3) and AB (4,2) can be given by s = (2–4)² + (3–2)². 4. If we observe in the fig, the highlighted distance between (A, CD) is 4 and is less compared to (AB, A) which is 5.
Hey, to my knowledge, the R function hclust is able to generate clustering from a distance matrix as input such as the matrix produced by the dist function in R.
To calculate distances use the Distances
package.
Given a matrix X
you can calculate pairwise distances between columns. This means that you should supply your input points (your n objects) to be the columns of the matrices. (In your question you mention nx3 matrix, so you would have to transpose this with the transpose()
function.)
Here is an example on how to use it:
>using Distances # install with Pkg.add("Distances")
>x = rand(3,2)
3x2 Array{Float64,2}:
0.27436 0.589142
0.234363 0.728687
0.265896 0.455243
>pairwise(Euclidean(), x, x)
2x2 Array{Float64,2}:
0.0 0.615871
0.615871 0.0
As you can see the above returns the distance matrix between the columns of X
. You can use other distance metrics if you need to, just check the docs for the package.
Just for completeness to the @niczky12 answer, there is a package in Julia called Clustering which essentially, as the name says, allows you to perform clustering.
A sample kmeans
algorithm:
>>> using Clustering # Pkg.add("Clustering") if not installed
>>> X = rand(3, 100) # data, each column is a sample
>>> k = 10 # number of clusters
>>> r = kmeans(X, k)
>>> fieldnames(r)
8-element Array{Symbol,1}:
:centers
:assignments
:costs
:counts
:cweights
:totalcost
:iterations
:converged
The result is stored in the return of the kmeans (r
) which contains the above fields. The two probably most interesting fields: r.centers
contains the centers detected by the kmeans algorithm and r.assigments
contains the cluster to which each of the 100 samples belongs.
There are several other clustering methods in the same package. Feel free to dive into the documentation and apply the one that best suits your needs.
In your case, as your data is an N x 3
matrix you only need to transpose it:
M = rand(100, 3)
kmeans(M', k)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With