Clustering and distance calculation in Julia

Tags:

hierarchical-clustering

I have a collection of n coordinate points of the form (x,y,z). These are stored in an n x 3 matrix M.

Is there a built in function in Julia to calculate the distance between each point and every other point? I'm working with a small number of points so calculation time isn't too important.

My overall goal is to run a clustering algorithm, so if there is a clustering algorithm that I can look at that doesn't require me to first calculate these distances please suggest that too. An example of the data I would like to perform clustering on is below. Obviously I'd only need to do this for the z coordinate.

Example of data set I need to perform clustering on

885

asked Apr 12 '16 03:04

lara

2 Answers

To calculate distances use the Distances package.

Given a matrix X you can calculate pairwise distances between columns. This means that you should supply your input points (your n objects) to be the columns of the matrices. (In your question you mention nx3 matrix, so you would have to transpose this with the transpose() function.)

Here is an example on how to use it:

>using Distances  # install with Pkg.add("Distances")

>x = rand(3,2)

3x2 Array{Float64,2}:
 0.27436   0.589142
 0.234363  0.728687
 0.265896  0.455243

>pairwise(Euclidean(), x, x)

2x2 Array{Float64,2}:
 0.0       0.615871
 0.615871  0.0

As you can see the above returns the distance matrix between the columns of X. You can use other distance metrics if you need to, just check the docs for the package.

answered Sep 19 '22 22:09

niczky12

Just for completeness to the @niczky12 answer, there is a package in Julia called Clustering which essentially, as the name says, allows you to perform clustering.

A sample kmeans algorithm:

>>> using Clustering         # Pkg.add("Clustering") if not installed

>>> X = rand(3, 100)         # data, each column is a sample
>>> k = 10                   # number of clusters

>>> r = kmeans(X, k)
>>> fieldnames(r)
8-element Array{Symbol,1}:
:centers    
:assignments
:costs      
:counts     
:cweights   
:totalcost  
:iterations 
:converged

The result is stored in the return of the kmeans (r) which contains the above fields. The two probably most interesting fields: r.centers contains the centers detected by the kmeans algorithm and r.assigments contains the cluster to which each of the 100 samples belongs.

There are several other clustering methods in the same package. Feel free to dive into the documentation and apply the one that best suits your needs.

In your case, as your data is an N x 3 matrix you only need to transpose it:

M = rand(100, 3)
kmeans(M', k)

answered Sep 19 '22 22:09

Imanol Luengo

Related questions
                            
                                Julia version of R's Match?
                            
                                Convert RGBA{U8}(0.384,0.0,0.0,1.0) to Integer
                            
                                Julia what does a nameless value in the function header mean?
                            
                                Julia: Append an element to an array of custom types
                            
                                Convert binary to decimal in Julia
                            
                                Julia: how to iterate row by row in a multidimensional array
                            
                                Julia: How much can we change the objects in immutable struct type?
                            
                                How to treat String as Array/Vector in Julia
                            
                                How to initialize an array of structs in Julia
                            
                                Where to find the signature of a function in Julia?
                            
                                Is it possible to pre-allocate array for matrix factorization?
                            
                                Julia - How to pass kwargs from a function to a macro
                            
                                Conditionally define a function inside another function in Julia
                            
                                How to extract the specific type from an instance of a generic type in julia?
                            
                                How to filter rows from Julia Array based on value of value in specified column?
                            
                                Avoid broadcasting on an argument in Julia
                            
                                Julia: How to set the package Dev path?
                            
                                Clone a function in Julia
                            
                                Julia anonymous functions and performance
                            
                                Julia - write to the beginning of file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With