I am looking for an efficient algorithm for the following problem: Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster: <img src="https://i.stack.imgur.com/UjiZX.png" alt="sample clusters"> In other words, such cluster is a set of points which are 'close enough' to each other. The naive algorithm may look like this: <ol> <li>Let R be a resulting list of clusters, initially empty</li> <li>Let P be a list of points, initially contains all points</li> <li>Pick random point from P and create a cluster C which contains only this point. Delete this point from P </li> <li>For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P </li> <li>If at least one point was added to cluster C during the step 4, go to step 4</li> <li>Add cluster C to list R. if P is not empty, go to step 3</li> </ol> However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem? P.S. I don't know the number of clusters apriori

There are some classic algorithms here: <ul> <li>Hierarchical Agglomerative Clustering</li> <li>DBSCAN</li> </ul> that you should read and understand.

<ol> <li>Split up the space of points into a grid. This grid would have unit length equal to threshhold / sqrt(8). </li> <li>Iterate though the list of points P, adding each point to both the square it occupies and a new cluster. If a point is added to a square which already contains a point, add it to the cluster of the other point(s). I'll call the list of all occupied sqaures S.</li> <li>Now take any square from S and its cluster c. For each adjacent or diagonal square, combine the cluster of that square with c and remove the square from S. Repeat the process for all squares just added.</li> <li>Once no more adjacent sqaures can be found, the cluster is finished and can be added to C. Repeat step 3 with any remaining squares in S. When S is empty, you're finished.</li> </ol>

Efficient algorithm to group points in clusters by distance between every two points

Q: Which clustering algorithm is best?

k-means is the most widely-used centroid-based clustering algorithm. Centroid-based algorithms are efficient but sensitive to initial conditions and outliers. This course focuses on k-means because it is an efficient, effective, and simple clustering algorithm.

Q: How do you find the distance between two clusters?

In Average linkage clustering, the distance between two clusters is defined as the average of distances between all pairs of objects, where each pair is made up of one object from each group. D(r,s) = Trs / ( Nr * Ns) Where Trs is the sum of all pairwise distances between cluster r and cluster s.

Q: Which clustering algorithm will you use to deal with a large data set?

CLARA (clustering large applications.) It is a sample-based method that randomly selects a small subset of data points instead of considering the whole observations, which means that it works well on a large dataset.

Tags:

algorithm

machine-learning

cluster-analysis

data-mining

I am looking for an efficient algorithm for the following problem:

Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster:

sample clusters

In other words, such cluster is a set of points which are 'close enough' to each other.

The naive algorithm may look like this:

Let R be a resulting list of clusters, initially empty
Let P be a list of points, initially contains all points
Pick random point from P and create a cluster C which contains only this point. Delete this point from P
For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P
If at least one point was added to cluster C during the step 4, go to step 4
Add cluster C to list R. if P is not empty, go to step 3

However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem?

P.S. I don't know the number of clusters apriori

767

asked Sep 06 '15 21:09

ovk

2 Answers

There are some classic algorithms here:

Hierarchical Agglomerative Clustering
DBSCAN

that you should read and understand.

answered Oct 20 '22 12:10

Has QUIT--Anony-Mousse

Split up the space of points into a grid. This grid would have unit length equal to threshhold / sqrt(8).
Iterate though the list of points P, adding each point to both the square it occupies and a new cluster. If a point is added to a square which already contains a point, add it to the cluster of the other point(s). I'll call the list of all occupied sqaures S.
Now take any square from S and its cluster c. For each adjacent or diagonal square, combine the cluster of that square with c and remove the square from S. Repeat the process for all squares just added.
Once no more adjacent sqaures can be found, the cluster is finished and can be added to C. Repeat step 3 with any remaining squares in S. When S is empty, you're finished.

answered Oct 20 '22 10:10

crb233

Related questions
                            
                                How to label "transitive groups" with SQL?
                            
                                Fast exact bigint factorial
                            
                                B-Tree for on-disk storage
                            
                                An algorithm to find the seed root of a given number
                            
                                Algorithm for finding a group of numbers in a list that equal a target
                            
                                in a graph, is O(|E|*|V|) complexity considered polynomial or not?
                            
                                algorithm LRU, how many bits needed for implement this algorithm?
                            
                                Balancing a binary search tree
                            
                                Normalizing a random unending unknown series?
                            
                                Union of two network diagrams
                            
                                Jacobian matrix computation for artificial neural networks
                            
                                Implementing a depth-first tree iterator in Python
                            
                                Recursive Karatsuba multiplication not working?
                            
                                Time Complexity of Singular Value Decomposition
                            
                                What algorithm is this? Best way to distribute limited resources
                            
                                Floyd–Rivest vs. Introselect algorithm performance
                            
                                c# 3D Bin Packing for shipping [closed]
                            
                                Where is the logic wrong in my code?
                            
                                Array of random numbers with sum in given range?
                            
                                Which algorithms are there to find the Smallest Set of Smallest Rings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With