scikit-learn SVM with a lot of samples / mini batch possible?

Question

According to http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html I read:

"The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples."

I have currently 350,000 samples and 4,500 classes and this number will grow further to 1-2 million samples and 10k + classes.

My problem is that I am running out of memory. All is working as it should when I use just 200,000 samples with less than 1000 classes.

Is there a way to build-in or use something like minibatches with SVM? I saw there exists MiniBatchKMeans but I dont think its for SVM?

Any input welcome!

Sergey Zakharov · Accepted Answer

I mentioned this problem in my answer to this question.

You can split your large dataset into batches that can be safely consumed by an SVM algorithm, then find support vectors for each batch separately, and then build a resulting SVM model on a dataset consisting of all the support vectors found in all the batches.

Also if there is no need in using kernels in your case, then you can use sklearn's SGDClassifier, which implements stochastic gradient descent. It fits linear SVM by default.

scikit-learn SVM with a lot of samples / mini batch possible?

Tags:

svm

scikit-learn

domi771

1 Answers

Sergey Zakharov

Recent Activity

Donate For Us

scikit-learn SVM with a lot of samples / mini batch possible?

Tags:

svm

scikit-learn

domi771

1 Answers

Sergey Zakharov

Related questions

Recent Activity

Donate For Us