Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn SVM: SVR and SVC, getting the same prediction for every input

Here is a paste of the code: SVM sample code

I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different.

First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers).

I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after each time-step).

I initialize my SVR (and SVC), train them, and then test them with 30 out-of-sample inputs...and get the exact same prediction for every input (and the inputs are changing by reasonable amounts--0.3, 0.6, 0.5, etc.). I would think that the classifier (at least) would have some differentiation...

Here is the code I've got:

# train svr

my_svr = svm.SVR()
my_svr.fit(x_training,y_trainr)

# train svc

my_svc = svm.SVC()
my_svc.fit(x_training,y_trainc)


# predict regression

p_regression = my_svr.predict(x_test)
p_r_series = pd.Series(index=y_testing.index,data=p_regression)

# predict classification

p_classification = my_svc.predict(x_test)
p_c_series = pd.Series(index=y_testing_classification.index,data=p_classification)

And here are samples of my inputs:

x_training = [[  1.52068627e-04   8.66880301e-01   5.08504362e-01   9.48082047e-01
7.01156322e-01],
              [  6.68130520e-01   9.07506250e-01   5.07182647e-01   8.11290634e-01
6.67756208e-01],
              ... x 70 ]

y_trainr = [-0.00723209 -0.01788079  0.00741741 -0.00200805 -0.00737761  0.00202704 ...]

y_trainc = [ 0.  0.  1.  0.  0.  1.  1.  0. ...]

And the x_test matrix (5x30) is similar to the x_training matrix in terms of magnitudes and variance of inputs...same for y_testr and y_testc.

Currently, the predictions for all of the tests are exactly the same (0.00596 for the regression, and 1 for the classification...)

How do I get the SVR and SVC functions to spit out relevant predictions? Or at least different predictions based on the inputs...

At the very least, the classifier should be able to make choices. I mean, even if I haven't provided enough dimensions for regression...

like image 927
Chris Avatar asked Dec 26 '15 21:12

Chris


People also ask

What is SVC and SVR in SVM?

As discussed earlier, SVM is used for both classification and regression problems. Scikit-learn's method of Support Vector Classification (SVC) can be extended to solve regression problems as well. That extended method is called Support Vector Regression (SVR).

What is difference between SVR and SVC?

SVC is a classifier, SVR is a regressor.

What is the difference between SVM and SVR?

Those who are in Machine Learning or Data Science are quite familiar with the term SVM or Support Vector Machine. But SVR is a bit different from SVM. As the name suggest the SVR is an regression algorithm , so we can use SVR for working with continuous Values instead of Classification which is SVM.

What is epsilon SVR?

SVR has an additional tunable parameter ε (epsilon). The value of epsilon determines the width of the tube around the estimated function (hyperplane). Points that fall inside this tube are considered as correct predictions and are not penalized by the algorithm.


2 Answers

Try increasing your C from the default. It seems you are underfitting.

my_svc = svm.SVC(probability=True, C=1000)
my_svc.fit(x_training,y_trainc)

p_classification = my_svc.predict(x_test)

p_classification then becomes:

array([ 1.,  0.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,
        1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.])

For the SVR case you will also want to reduce your epsilon.

my_svr = svm.SVR(C=1000, epsilon=0.0001)
my_svr.fit(x_training,y_trainr)

p_regression = my_svr.predict(x_test)

p_regression then becomes:

array([-0.00430622,  0.00022762,  0.00595002, -0.02037147, -0.0003767 ,
        0.00212401,  0.00018503, -0.00245148, -0.00109994, -0.00728342,
       -0.00603862, -0.00321413, -0.00922082, -0.00129351,  0.00086844,
        0.00380351, -0.0209799 ,  0.00495681,  0.0070937 ,  0.00525708,
       -0.00777854,  0.00346639,  0.0070703 , -0.00082952,  0.00246366,
        0.03007465,  0.01172834,  0.0135077 ,  0.00883518,  0.00399232])

You should look to tune your C parameter using cross validation so that it is able to perform best on whichever metric matters most to you. You may want to look at GridSearchCV to help you do this.

like image 170
David Maust Avatar answered Oct 18 '22 11:10

David Maust


I had the same issue, but a completely different cause, and therefore a completely different place to look for a solution.

If your prediction inputs are scaled incorrectly for any reason, you can experience the same symptoms found here. This could be forgetting (or miscoding) the scaling of input values in a later prediction, or due to the inputs being in the wrong order.

like image 40
James Nowell Avatar answered Oct 18 '22 13:10

James Nowell