Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the SVM in sklearn support incremental (online) learning?

I am currently in the process of designing a recommender system for text articles (a binary case of 'interesting' or 'not interesting'). One of my specifications is that it should continuously update to changing trends.

From what I can tell, the best way to do this is to make use of machine learning algorithm that supports incremental/online learning.

Algorithms like the Perceptron and Winnow support online learning but I am not completely certain about Support Vector Machines. Does the scikit-learn python library support online learning and if so, is a support vector machine one of the algorithms that can make use of it?

I am obviously not completely tied down to using support vector machines, but they are usually the go to algorithm for binary classification due to their all round performance. I would be willing to change to whatever fits best in the end.

like image 923
Michael Aquilina Avatar asked Apr 14 '14 09:04

Michael Aquilina


People also ask

What is Sklearn SVM used for?

Support vector machines (SVMs) are supervised machine learning algorithms for outlier detection, regression, and classification that are both powerful and adaptable. Sklearn SVMs are commonly employed in classification tasks because they are particularly efficient in high-dimensional fields.

What is incremental SVM?

Incremental Support Vector Machines (SVM) are instrumental in practical applications of online learning. This work focuses on the design and analysis of efficient incremental SVM learning, with the aim of providing a fast, numerically stable and robust implementation.

Which learning technique does SVM use?

No, the SVM algorithm has a technique called the kernel trick. The SVM kernel is a function that takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts not separable problem to separable problem. It is mostly useful in non-linear separation problem.

What are the benefits of using the SVM model?

Advantages of support vector machine : Support vector machine works comparably well when there is an understandable margin of dissociation between classes. It is more productive in high dimensional spaces. It is effective in instances where the number of dimensions is larger than the number of specimens.


2 Answers

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

One of my specifications is that it should continuously update to changing trends.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it's learning rate does not decrease over time.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

It also has more online linear and kernel methods.

(bias note: I'm the author of JSAT).

like image 77
Raff.Edward Avatar answered Sep 28 '22 03:09

Raff.Edward


Maybe it's me being naive but I think it is worth mentioning how to actually update the sci-kit SGD classifier when you present your data incrementally:

clf = linear_model.SGDClassifier() x1 = some_new_data y1 = the_labels clf.partial_fit(x1,y1) x2 = some_newer_data y2 = the_labels clf.partial_fit(x2,y2) 
like image 20
Jariani Avatar answered Sep 28 '22 03:09

Jariani