Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't LinearSVC do this simple classification?

I'm trying to do the following simple classification using the LinearSVC object in scikit-learn. I've tried using both version 0.10 and 0.14. Using the code:

from sklearn.svm import LinearSVC, SVC
from numpy import *

data = array([[ 1007.,  1076.],
              [ 1017.,  1009.],
              [ 2021.,  2029.],
              [ 2060.,  2085.]])
groups = array([1, 1, 2, 2])

svc = LinearSVC()
svc.fit(data, groups)
svc.predict(data)

I get the output:

array([2, 2, 2, 2])

However, if I replace the classifier with

svc = SVC(kernel='linear')

then I get the result

array([ 1.,  1.,  2.,  2.])

which is correct. Does anyone know why using LinearSVC would botch this simple problem?

like image 529
Isaac Avatar asked Dec 17 '13 01:12

Isaac


People also ask

What is the difference between SVC and LinearSVC?

The main difference between them is linearsvc lets your choose only linear classifier whereas svc let yo choose from a variety of non-linear classifiers. however it is not recommended to use svc for non-linear problems as they are super slow.

Is LinearSVC faster than Svc?

Between SVC and LinearSVC , one important decision criterion is that LinearSVC tends to be faster to converge the larger the number of samples is. This is due to the fact that the linear kernel is a special case, which is optimized for in Liblinear, but not in Libsvm.

What is LinearSVC in machine learning?

Linear Support Vector Machine (Linear SVC) is an algorithm that attempts to find a hyperplane to maximize the distance between classified samples.

What is an SVC classifier?

SVC, or Support Vector Classifier, is a supervised machine learning algorithm typically used for classification tasks. SVC works by mapping data points to a high-dimensional space and then finding the optimal hyperplane that divides the data into two classes.


1 Answers

The algorithm underlying LinearSVC is very sensitive to extreme values in its input:

>>> svc = LinearSVC(verbose=1)
>>> svc.fit(data, groups)
[LibLinear]....................................................................................................
optimization finished, #iter = 1000

WARNING: reaching max number of iterations
Using -s 2 may be faster (also see FAQ)

Objective value = -0.001256
nSV = 4
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=None, tol=0.0001, verbose=1)

(The warning refers to the LibLinear FAQ, since scikit-learn's LinearSVC is based on that library.)

You should normalize before fitting:

>>> from sklearn.preprocessing import scale
>>> data = scale(data)
>>> svc.fit(data, groups)
[LibLinear]...
optimization finished, #iter = 39
Objective value = -0.240988
nSV = 4
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=None, tol=0.0001, verbose=1)
>>> svc.predict(data)
array([1, 1, 2, 2])
like image 80
Fred Foo Avatar answered Sep 29 '22 23:09

Fred Foo