Does the SVM in sklearn support incremental (online) learning?

Tags:

I am currently in the process of designing a recommender system for text articles (a binary case of 'interesting' or 'not interesting'). One of my specifications is that it should continuously update to changing trends.

From what I can tell, the best way to do this is to make use of machine learning algorithm that supports incremental/online learning.

Algorithms like the Perceptron and Winnow support online learning but I am not completely certain about Support Vector Machines. Does the scikit-learn python library support online learning and if so, is a support vector machine one of the algorithms that can make use of it?

I am obviously not completely tied down to using support vector machines, but they are usually the go to algorithm for binary classification due to their all round performance. I would be willing to change to whatever fits best in the end.

923

asked Apr 14 '14 09:04

Michael Aquilina

2 Answers

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

One of my specifications is that it should continuously update to changing trends.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it's learning rate does not decrease over time.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

It also has more online linear and kernel methods.

(bias note: I'm the author of JSAT).

answered Sep 28 '22 03:09

Raff.Edward

Maybe it's me being naive but I think it is worth mentioning how to actually update the sci-kit SGD classifier when you present your data incrementally:

clf = linear_model.SGDClassifier() x1 = some_new_data y1 = the_labels clf.partial_fit(x1,y1) x2 = some_newer_data y2 = the_labels clf.partial_fit(x2,y2)

answered Sep 28 '22 03:09

Jariani

Related questions
                            
                                Using virtualenv with spaces in a path
                            
                                Python curve_fit with multiple independent variables
                            
                                How to configure vim to not put comments at the beginning of lines while editing python files
                            
                                how to kill (or avoid) zombie processes with subprocess module
                            
                                What happens when a module is imported twice?
                            
                                How to apply a function on every row on a dataframe?
                            
                                How to use type hints in python 3.6?
                            
                                Is it possible to get pip to print the configuration it is using?
                            
                                Is json.loads() vulnerable to arbitrary code execution?
                            
                                When is StringIO used, as opposed to joining a list of strings?
                            
                                Python NoneType object is not callable (beginner)
                            
                                Forcing application/json MIME type in a view (Flask)
                            
                                Get seconds since midnight in Python [closed]
                            
                                time.time vs. timeit.timeit
                            
                                How to encode bytes in JSON? json.dumps() throwing a TypeError
                            
                                Selenium versus BeautifulSoup for web scraping
                            
                                for or while loop to do something n times
                            
                                How to get the current Python interpreter path from inside a Python script? [duplicate]
                            
                                Should a return statement have parentheses?
                            
                                Scikit-learn's LabelBinarizer vs. OneHotEncoder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does the SVM in sklearn support incremental (online) learning?

Tags:

python

machine-learning

svm

scikit-learn

Michael Aquilina

People also ask

2 Answers

Raff.Edward

Jariani

Recent Activity

Donate For Us