Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed-up k-means from Scikit learn?

On my project I have used k-means to classify data between groups, but I have a problem with the computation of the k-means from Scikit-learn - it was very slow. I need to boost it.

I have tried to change the number of n_jobs to -1, but still very slow!

Any suggestions how to speed up?

like image 379
user8058941 Avatar asked Oct 01 '17 18:10

user8058941


1 Answers

The main solution in scikit-learn is to switch to mini-batch kmeans which reduces computational resources a lot. To some extent it is an analogous approach to SGD (Stochastic Gradient Descent) vs. GD (Gradient Descent) for optimising non-linear functions - SGD is usually faster (in terms of computational cycles needed to converge to the local solution). Note that this introduces more variance to the optimisation, thus results might be harder to reproduce (optimisation will end up in different solutions more often than "full batch" kmeans).

like image 96
lejlot Avatar answered Oct 04 '22 14:10

lejlot