Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i distribute processing of minibatch kmeans (scikit-learn)?

In Scikit-learn , K-Means have n_jobs but MiniBatch K-Means is lacking it. MBK is faster than KMeans but at large sample sets we would like it distribute the processing across multiprocessing (or other parallel processing libraries).

Is MKB's Partial-fit the answer?

like image 408
Phyo Arkar Lwin Avatar asked Jun 11 '13 20:06

Phyo Arkar Lwin


1 Answers

I don't think this is possible. You could implement something with OpenMP inside the minibatch processing. I'm not aware of any parallel minibatch k-means procedures. Parallizing stochastic gradient descent procedures is somewhat hairy.

Btw, the n_jobs parameter in KMeans only distributes the different random initializations afaik.

like image 179
Andreas Mueller Avatar answered Oct 31 '22 18:10

Andreas Mueller