Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustering and distance calculation in Julia

I have a collection of n coordinate points of the form (x,y,z). These are stored in an n x 3 matrix M.

Is there a built in function in Julia to calculate the distance between each point and every other point? I'm working with a small number of points so calculation time isn't too important.

My overall goal is to run a clustering algorithm, so if there is a clustering algorithm that I can look at that doesn't require me to first calculate these distances please suggest that too. An example of the data I would like to perform clustering on is below. Obviously I'd only need to do this for the z coordinate.

Example of data set I need to perform clustering on

like image 885
lara Avatar asked Apr 12 '16 03:04

lara


People also ask

What distance is used to measure clustering?

Euclidean distance is considered the traditional metric for problems with geometry. It can be simply explained as the ordinary distance between two points. It is one of the most used algorithms in the cluster analysis.

How do you calculate Euclidean distance in clustering?

Calculate squared euclidean distance between all data points to the centroids AB, CD. For example distance between A(2,3) and AB (4,2) can be given by s = (2–4)² + (3–2)². 4. If we observe in the fig, the highlighted distance between (A, CD) is 4 and is less compared to (AB, A) which is 5.

Which function is used to create distance matrix in clustering?

Hey, to my knowledge, the R function hclust is able to generate clustering from a distance matrix as input such as the matrix produced by the dist function in R.


2 Answers

To calculate distances use the Distances package.

Given a matrix X you can calculate pairwise distances between columns. This means that you should supply your input points (your n objects) to be the columns of the matrices. (In your question you mention nx3 matrix, so you would have to transpose this with the transpose() function.)

Here is an example on how to use it:

>using Distances  # install with Pkg.add("Distances")

>x = rand(3,2)

3x2 Array{Float64,2}:
 0.27436   0.589142
 0.234363  0.728687
 0.265896  0.455243

>pairwise(Euclidean(), x, x)

2x2 Array{Float64,2}:
 0.0       0.615871
 0.615871  0.0     

As you can see the above returns the distance matrix between the columns of X. You can use other distance metrics if you need to, just check the docs for the package.

like image 71
niczky12 Avatar answered Sep 19 '22 22:09

niczky12


Just for completeness to the @niczky12 answer, there is a package in Julia called Clustering which essentially, as the name says, allows you to perform clustering.

A sample kmeans algorithm:

>>> using Clustering         # Pkg.add("Clustering") if not installed

>>> X = rand(3, 100)         # data, each column is a sample
>>> k = 10                   # number of clusters

>>> r = kmeans(X, k)
>>> fieldnames(r)
8-element Array{Symbol,1}:
:centers    
:assignments
:costs      
:counts     
:cweights   
:totalcost  
:iterations 
:converged

The result is stored in the return of the kmeans (r) which contains the above fields. The two probably most interesting fields: r.centers contains the centers detected by the kmeans algorithm and r.assigments contains the cluster to which each of the 100 samples belongs.

There are several other clustering methods in the same package. Feel free to dive into the documentation and apply the one that best suits your needs.


In your case, as your data is an N x 3 matrix you only need to transpose it:

M = rand(100, 3)
kmeans(M', k)
like image 35
Imanol Luengo Avatar answered Sep 19 '22 22:09

Imanol Luengo