Scaling data in scikit-learn SVM

Tags:

While libsvm provides tools for scaling data, with Scikit-Learn (which should be based upon libSVM for the SVC classifier) I find no way to scale my data.

Basically I want to use 4 features, of which 3 range from 0 to 1 and the last one is a "big" highly variable number.

If I include the fourth feature in libSVM (using the easy.py script which scales my data automatically) I get some very nice results (96% accuracy). If I include the fourth variable in Scikit-Learn the accuracy drops to ~78% - but if I exclude it, I get the same results I get in libSVM when excluding that feature. Therefore I am pretty sure it's a problem of missing scaling.

How do I replicate programmatically (i.e. without calling svm-scale) the scaling process of SVM?

542

asked Nov 10 '12 17:11

luke14free

1 Answers

You have that functionality in sklearn.preprocessing:

>>> from sklearn import preprocessing
>>> X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]
>>> X_scaled = preprocessing.scale(X)

>>> X_scaled                                          
array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

The data will then have zero mean and unit variance.

182

answered Sep 29 '22 13:09

Maehler

Related questions
                            
                                Is there any elegant way to define a dataframe with column of dtype array?
                            
                                Display interactive plotly chart (.html file) on GitHub Pages
                            
                                trying to install numpy in python3.9 and getting error in preparing wheel metadata in windows 10. I did not checked using virtual environment [duplicate]
                            
                                What Python bindings are there for CVS or SVN?
                            
                                Better resources to learn buildout
                            
                                Calling Py_Finalize() from C
                            
                                python 2.7 vs python 3.1
                            
                                Can I retrieve IMDb's movie recommendations for a given movie using IMDbPY?
                            
                                Creation of a simple HTML file upload page
                            
                                How to notify myself when a python script runs into an error or just stops?
                            
                                Python Inheritance : Return subclass
                            
                                Casting from base Model instance to derived proxy Model in Django?
                            
                                Constrained least-squares estimation in Python
                            
                                Can't get pyparsing Dict() to return nested dictionary
                            
                                PIP install and Python path
                            
                                Can executables made with py2app include other terminal scripts and run them?
                            
                                Django: Using Annotate, Count and Distinct on a Queryset
                            
                                Which features are monkey patched by gunicorn gevent worker?
                            
                                Python - User-defined classes have __cmp__() and __hash__() methods by default? Or?
                            
                                Building a small numpy array from individual values: Fast and readable method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scaling data in scikit-learn SVM

Tags:

python

svm

scikit-learn

libsvm

luke14free

People also ask

1 Answers

Maehler

Recent Activity

Donate For Us