Here is a paste of the code: SVM sample code
I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different.
First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers).
I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after each time-step).
I initialize my SVR (and SVC), train them, and then test them with 30 out-of-sample inputs...and get the exact same prediction for every input (and the inputs are changing by reasonable amounts--0.3, 0.6, 0.5, etc.). I would think that the classifier (at least) would have some differentiation...
Here is the code I've got:
# train svr
my_svr = svm.SVR()
my_svr.fit(x_training,y_trainr)
# train svc
my_svc = svm.SVC()
my_svc.fit(x_training,y_trainc)
# predict regression
p_regression = my_svr.predict(x_test)
p_r_series = pd.Series(index=y_testing.index,data=p_regression)
# predict classification
p_classification = my_svc.predict(x_test)
p_c_series = pd.Series(index=y_testing_classification.index,data=p_classification)
And here are samples of my inputs:
x_training = [[ 1.52068627e-04 8.66880301e-01 5.08504362e-01 9.48082047e-01
7.01156322e-01],
[ 6.68130520e-01 9.07506250e-01 5.07182647e-01 8.11290634e-01
6.67756208e-01],
... x 70 ]
y_trainr = [-0.00723209 -0.01788079 0.00741741 -0.00200805 -0.00737761 0.00202704 ...]
y_trainc = [ 0. 0. 1. 0. 0. 1. 1. 0. ...]
And the x_test
matrix (5x30) is similar to the x_training
matrix in terms of magnitudes and variance of inputs...same for y_testr
and y_testc
.
Currently, the predictions for all of the tests are exactly the same (0.00596 for the regression, and 1 for the classification...)
How do I get the SVR and SVC functions to spit out relevant predictions? Or at least different predictions based on the inputs...
At the very least, the classifier should be able to make choices. I mean, even if I haven't provided enough dimensions for regression...
As discussed earlier, SVM is used for both classification and regression problems. Scikit-learn's method of Support Vector Classification (SVC) can be extended to solve regression problems as well. That extended method is called Support Vector Regression (SVR).
SVC is a classifier, SVR is a regressor.
Those who are in Machine Learning or Data Science are quite familiar with the term SVM or Support Vector Machine. But SVR is a bit different from SVM. As the name suggest the SVR is an regression algorithm , so we can use SVR for working with continuous Values instead of Classification which is SVM.
SVR has an additional tunable parameter ε (epsilon). The value of epsilon determines the width of the tube around the estimated function (hyperplane). Points that fall inside this tube are considered as correct predictions and are not penalized by the algorithm.
Try increasing your C from the default. It seems you are underfitting.
my_svc = svm.SVC(probability=True, C=1000)
my_svc.fit(x_training,y_trainc)
p_classification = my_svc.predict(x_test)
p_classification then becomes:
array([ 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0.,
1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1.,
1., 1., 1., 1.])
For the SVR
case you will also want to reduce your epsilon.
my_svr = svm.SVR(C=1000, epsilon=0.0001)
my_svr.fit(x_training,y_trainr)
p_regression = my_svr.predict(x_test)
p_regression then becomes:
array([-0.00430622, 0.00022762, 0.00595002, -0.02037147, -0.0003767 ,
0.00212401, 0.00018503, -0.00245148, -0.00109994, -0.00728342,
-0.00603862, -0.00321413, -0.00922082, -0.00129351, 0.00086844,
0.00380351, -0.0209799 , 0.00495681, 0.0070937 , 0.00525708,
-0.00777854, 0.00346639, 0.0070703 , -0.00082952, 0.00246366,
0.03007465, 0.01172834, 0.0135077 , 0.00883518, 0.00399232])
You should look to tune your C parameter using cross validation so that it is able to perform best on whichever metric matters most to you. You may want to look at GridSearchCV
to help you do this.
I had the same issue, but a completely different cause, and therefore a completely different place to look for a solution.
If your prediction inputs are scaled incorrectly for any reason, you can experience the same symptoms found here. This could be forgetting (or miscoding) the scaling of input values in a later prediction, or due to the inputs being in the wrong order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With