How to update an SVM model with new data

Tags:

I have two data set with different size.

1) Data set 1 is with high dimensions 4500 samples (sketches).

2) Data set 2 is with low dimension 1000 samples (real data). I suppose that "both data set have the same distribution"

I want to train an non linear SVM model using sklearn on the first data set (as a pre-training ), and after that I want to update the model on a part of the second data set (to fit the model). How can I develop a kind of update on sklearn. How can I update a SVM model?

647

asked Feb 18 '16 21:02

Jeanne

1 Answers

In sklearn you can do this only for linear kernel and using SGDClassifier (with appropiate selection of loss/penalty terms, loss should be hinge, and penalty L2). Incremental learning is supported through partial_fit methods, and this is not implemented for neither SVC nor LinearSVC.

Unfortunately, in practise fitting SVM in incremental fashion for such small datasets is rather useless. SVM has easy obtainable global solution, thus you do not need pretraining of any form, in fact it should not matter at all, if you are thinking about pretraining in the neural network sense. If correctly implemented, SVM should completely forget previous dataset. Why not learn on the whole data in one pass? This is what SVM is supposed to do. Unless you are working with some non-convex modification of SVM (then pretraining makes sense).

To sum up:

From theoretical and practical point of view there is no point in pretraining SVM. You can either learn only on the second dataset, or on both in the same time. Pretraining is only reasonable for methods which suffer from local minima (or hard convergence of any kind) thus need to start near actual solution to be able to find reasonable model (like neural networks). SVM is not one of them.
You can use incremental fitting (although in sklearn it is very limited) for efficiency reasons, but for such small dataset you will be just fine fitting whole dataset at once.

129

answered Nov 05 '22 18:11

lejlot

Related questions
                            
                                Sklearn: Evaluate performance of each classifier of OneVsRestClassifier inside GridSearchCV
                            
                                Lists sorting in Python (transpose)
                            
                                Get values from matplotlib AxesSubplot
                            
                                Exclude test files from Pylint
                            
                                Proper way to mock classes and assert on calls to methods
                            
                                save base64 image python
                            
                                matplotlib graph shows only points instead of line
                            
                                creating dask dataframe by reading a pickle file in dask module of Python
                            
                                finding the last occurrence of an item in a list python
                            
                                How to unravel array?
                            
                                SerialException: could not open port (Access is denied)
                            
                                Django-Haystack giving attribute error?
                            
                                Urwid: make cursor invisible
                            
                                Error tokenizing data. C error: EOF following escape character
                            
                                Preprocess a Tensorflow tensor in Numpy
                            
                                Python unittest successfully asserts None is False
                            
                                Theano Dimshuffle equivalent in Google's TensorFlow?
                            
                                Why does print(0.3) print 0.3 and not 0.30000000000000004
                            
                                How to print with inline if statement?
                            
                                pandas: Is it possible to filter a dataframe with arbitrarily long boolean criteria?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to update an SVM model with new data

Tags:

python

machine-learning

numpy

computer-vision

scikit-learn

Jeanne

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us