Python Clustering Algorithms

Tags:

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question).

I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or start looking) for clusters. The problem is, I have potentially thousands of these clusters of particles, and I cannot spend the time to tell kmeans/dbscan algorithms what they should go off of.

Here is an example of what dbscan find: dbscanfail

You can see that there really are two separate populations here, though adjusting the epsilon factor (the max. distance between neighboring clusters parameter), I simply cannot get it to see those two populations of particles.

Is there any other algorithms which would work here? I'm looking for minimal information upfront - in other words, I'd like the algorithm to be able to make "smart" decisions about what could constitute a separate cluster.

363

asked Nov 13 '13 14:11

astromax

1 Answers

I've found one that requires NO a priori information/guesses and does very well for what I'm asking it to do. It's called Mean Shift and is located in SciKit-Learn. It's also relatively quick (compared to other algorithms like Affinity Propagation).

Here's an example of what it gives:

MeanShiftResults

I also want to point out that in the documentation is states that it may not scale well.

117

answered Sep 28 '22 17:09

astromax

Related questions
                            
                                Image clustering by its similarity in python
                            
                                How can i cluster document using k-means (Flann with python)?
                            
                                How to generate performance stats of clustering from flexclust?
                            
                                mahout lucene document clustering howto?
                            
                                Global Dynamic Supervisor in a cluster
                            
                                How to find cluster sizes in 2D numpy array?
                            
                                How does pytorch backprop through argmax?
                            
                                Is a Fuzzy C-Means algorithm available for Python?
                            
                                DBSCAN on spark : which implementation
                            
                                How do I predict new data's cluster after clustering training data?
                            
                                clustering very large dataset in R
                            
                                How do I create a radial cluster like the following code-example in Python?
                            
                                How to create a cluster plot in R?
                            
                                Assign new data point to cluster in kernel k-means (kernlab package in R)?
                            
                                TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'q')
                            
                                Newman's modularity clustering for graphs
                            
                                How to find the success rate of a clustering algorithm?
                            
                                Clustering with a distance matrix
                            
                                clustering with NA values in R
                            
                                Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Clustering Algorithms

Tags:

cluster-analysis

k-means

dbscan

astromax

People also ask

1 Answers

astromax

Recent Activity

Donate For Us