In Scikit-learn , K-Means have n_jobs but MiniBatch K-Means is lacking it. MBK is faster than KMeans but at large sample sets we would like it distribute the processing across multiprocessing (or other parallel processing libraries).
Is MKB's Partial-fit the answer?
I don't think this is possible. You could implement something with OpenMP inside the minibatch processing. I'm not aware of any parallel minibatch k-means procedures. Parallizing stochastic gradient descent procedures is somewhat hairy.
Btw, the n_jobs parameter in KMeans only distributes the different random initializations afaik.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With