I'm trying to do the following simple classification using the <code>LinearSVC</code> object in <code>scikit-learn</code>. I've tried using both version 0.10 and 0.14. Using the code: <pre class="prettyprint"><code>from sklearn.svm import LinearSVC, SVC from numpy import * data = array([[ 1007., 1076.], [ 1017., 1009.], [ 2021., 2029.], [ 2060., 2085.]]) groups = array([1, 1, 2, 2]) svc = LinearSVC() svc.fit(data, groups) svc.predict(data) </code></pre> I get the output: <pre class="prettyprint"><code>array([2, 2, 2, 2]) </code></pre> However, if I replace the classifier with <pre class="prettyprint"><code>svc = SVC(kernel='linear') </code></pre> then I get the result <pre class="prettyprint"><code>array([ 1., 1., 2., 2.]) </code></pre> which is correct. Does anyone know why using <code>LinearSVC</code> would botch this simple problem?

The algorithm underlying <code>LinearSVC</code> is very sensitive to extreme values in its input: <pre class="prettyprint"><code>>>> svc = LinearSVC(verbose=1) >>> svc.fit(data, groups) [LibLinear].................................................................................................... optimization finished, #iter = 1000 WARNING: reaching max number of iterations Using -s 2 may be faster (also see FAQ) Objective value = -0.001256 nSV = 4 LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2', random_state=None, tol=0.0001, verbose=1) </code></pre> (The warning refers to the LibLinear FAQ, since scikit-learn's <code>LinearSVC</code> is based on that library.) You should normalize before fitting: <pre class="prettyprint"><code>>>> from sklearn.preprocessing import scale >>> data = scale(data) >>> svc.fit(data, groups) [LibLinear]... optimization finished, #iter = 39 Objective value = -0.240988 nSV = 4 LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2', random_state=None, tol=0.0001, verbose=1) >>> svc.predict(data) array([1, 1, 2, 2]) </code></pre>

Why can't LinearSVC do this simple classification?

Tags:

python

scikit-learn

libsvm

liblinear

I'm trying to do the following simple classification using the LinearSVC object in scikit-learn. I've tried using both version 0.10 and 0.14. Using the code:

from sklearn.svm import LinearSVC, SVC
from numpy import *

data = array([[ 1007.,  1076.],
              [ 1017.,  1009.],
              [ 2021.,  2029.],
              [ 2060.,  2085.]])
groups = array([1, 1, 2, 2])

svc = LinearSVC()
svc.fit(data, groups)
svc.predict(data)

I get the output:

array([2, 2, 2, 2])

However, if I replace the classifier with

svc = SVC(kernel='linear')

then I get the result

array([ 1.,  1.,  2.,  2.])

which is correct. Does anyone know why using LinearSVC would botch this simple problem?

529

asked Dec 17 '13 01:12

Isaac

1 Answers

The algorithm underlying LinearSVC is very sensitive to extreme values in its input:

>>> svc = LinearSVC(verbose=1)
>>> svc.fit(data, groups)
[LibLinear]....................................................................................................
optimization finished, #iter = 1000

WARNING: reaching max number of iterations
Using -s 2 may be faster (also see FAQ)

Objective value = -0.001256
nSV = 4
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=None, tol=0.0001, verbose=1)

(The warning refers to the LibLinear FAQ, since scikit-learn's LinearSVC is based on that library.)

You should normalize before fitting:

>>> from sklearn.preprocessing import scale
>>> data = scale(data)
>>> svc.fit(data, groups)
[LibLinear]...
optimization finished, #iter = 39
Objective value = -0.240988
nSV = 4
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=None, tol=0.0001, verbose=1)
>>> svc.predict(data)
array([1, 1, 2, 2])

answered Sep 29 '22 23:09

Fred Foo

Related questions
                            
                                Django prefetch_related From Model With Multiple ManyToMany Relationships
                            
                                Python 2.7 - find and replace from text file, using dictionary, to new text file
                            
                                Match regex in any order
                            
                                Floating Point Arithmetic error
                            
                                Selenium - Unresponsive Script Error (Firefox)
                            
                                OpenCV wont' capture from MacBook Pro iSight
                            
                                Vectorizing a function (Python)
                            
                                Get an audio sample as float number from pyaudio-stream
                            
                                Callback function tkinter button with variable parameter
                            
                                SQLite3 serial type wasn't incremented
                            
                                Regex for removing data in parenthesis
                            
                                emacs Flycheck "Configured syntax checker python-flake8 cannot be used"
                            
                                Flask-RESTful: Using GET to download a file with REST
                            
                                Python PyQt QFileSystemModel Root Path
                            
                                What happens when objects in a Set are altered to match each other?
                            
                                Combining numpy multi-dimensional arrays
                            
                                How to pop up an interactive matplotlib figure in IPython?
                            
                                python selenium import my regular firefox profile ( add-ons)
                            
                                How do I get the time execution for each iteration? Python [duplicate]
                            
                                Error installing numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With