Which algorithm and what combination of hyper-parameters will be the best to cluster this data?

Tags:

I was learning about non-linear clustering algorithms and I came across this 2-D graph. I was wondering which clustering alogirthm and combination of hyper-parameters will cluster this data well.

Plot

Just like a human will cluster those 5 spikes. I want my algorithm to do it. I tried KMeans but it was only clustering horizontly or vertically. I started using GMM but couldn't get the hyper-parameters right for the desired clustering.

663

asked May 31 '19 12:05

rrm_2016

2 Answers

If it doesn't work, always try to improve the preprocessing first. Algorithms such as k-means are very sensitive to scaling, so that is something that needs to be chosen carefully.

GMM is clearly your first choice here. It may be worth trying out different tools. R's Mclust is very slow. Sklearn's GMM is sometimes unstable. ELKI is a bit harder to get started with, but its EM gave me the best results usually.

Apart from GMM, it likely is worth trying out correlation clustering. These algorithms assume there is some manifold (e.g., a line) on which a cluster exists. Examples include ORCLUS, LMCLUS, CASH, 4C, ... But in my opinion these mostly work for synthetic toy data.

111

answered Sep 22 '22 16:09

Has QUIT--Anony-Mousse

I will suggest trying out hierarchical clustering. In the Agglomerative approach, you will assign individual clusters to each point, and then combine clusters based on their distances from each other.

answered Sep 19 '22 16:09

Abhineet Gupta

Related questions
                            
                                unsupervised semantic clustering of phrases
                            
                                Is there any kind of subspace clustering package available in scikit-learn
                            
                                Matlab - Gaussian mixture and Fuzzy C-means less accurate than K-means on high-dimensional data (image of 26-dimension vectors)
                            
                                Online clustering of news articles
                            
                                Clustering with scipy - clusters via distance matrix, how to get back the original objects
                            
                                mahalanobis distance in Kmeans Clustering using OpenCV
                            
                                OpenCV-Python: How to detect a hotspot in thermal image?
                            
                                Converting igraph to networkx for clustering
                            
                                Choice of Machine Learning Platform [closed]
                            
                                Which is the best document clustering open-source package?
                            
                                Combining different similarities to build one final similarity
                            
                                Effective clustering of a similarity matrix
                            
                                What is the most efficient way to determine if two line segments are part of the same segment, within a tolerance?
                            
                                Mini-batch k-means returns less than k clusters
                            
                                How is Growing Neural Gas used for clustering?
                            
                                How to estimate eps using knn distance plot in DBSCAN
                            
                                How to set a minimum number of observations per clusters in k-means clustering?
                            
                                Scipy.cluster.hierarchy.fclusterdata + distance measure
                            
                                Representation and a good similarity measure between Tweets for topic detection
                            
                                Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which algorithm and what combination of hyper-parameters will be the best to cluster this data?

Tags:

cluster-analysis

k-means

unsupervised-learning

data-science

gmm

rrm_2016

People also ask

2 Answers

Has QUIT--Anony-Mousse

Abhineet Gupta

Recent Activity

Donate For Us