Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn kmeans custom distance [duplicate]

I looking to use the kmeans algorithm to cluster some data, but I would like to use a custom distance function. Is there any way I can change the distance function that is used by scikit-learn?

I would also settle for a different framework / module that would allow exchanging the distance function and can calculate the kmeans in parallel (I would like to speed up the calculation, which is a nice feature from scikit-learn)

Any suggestions?

like image 377
Nils Ziehn Avatar asked Jun 29 '15 23:06

Nils Ziehn


People also ask

Does SKlearn K-Means use euclidean distance?

Sklearn Kmeans uses the Euclidean distance. It has no metric parameter.

What is Inertia_ in K-Means?

K-Means: Inertia Inertia measures how well a dataset was clustered by K-Means. It is calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster. A good model is one with low inertia AND a low number of clusters ( K ).

Can we use Manhattan distance in K-Means?

If the manhattan distance metric is used in k-means clustering, the algorithm still yields a centroid with the median value for each dimension, rather than the mean value for each dimension as for Euclidean distance.

What is tot Withinss in K-Means?

tot. withinss : Total within-cluster sum of squares, i.e. sum(withinss). betweenss : The between-cluster sum of squares, i.e. $totss-tot. withinss$. size : The number of points in each cluster.


1 Answers

You could try spectral clustering algorithm which allows you to input your own distance matrix (calculated as you like).

Its performance has nothing to envy to K-means on convex boundaries, but does also the job on non-convex problems (detects connectivity). See more here.

The good news is that spectral clustering is also implemented in scikit-learn.

Hope it helps.

like image 185
gowithefloww Avatar answered Oct 01 '22 00:10

gowithefloww