Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Techniques to improve the accuracy of SVM classifier

Tags:

I am trying to build a classifier to predict breast cancer using the UCI dataset. I am using support vector machines. Despite my most sincere efforts to improve upon the accuracy of the classifier, I cannot get beyond 97.062%. I've tried the following:

1. Finding the most optimal C and gamma using grid search.
2. Finding the most discriminative feature using F-score.

Can someone suggest me techniques to improve upon the accuracy? I am aiming at at least 99%.

1.Data are already normalized to the ranger of [0,10]. Will normalizing it to [0,1]  help?

2. Some other method to find the best C and gamma?
like image 267
Prashant Pandey Avatar asked Aug 17 '16 16:08

Prashant Pandey


People also ask

What is a good accuracy for SVM?

Figure 2. The non-linear optimal hyperplane, which support-vector machine (SVM) can provide as a classification tool. Several studies have investigated SVM as a diagnostic tool for AD, and a number have shown good levels of accuracy (5–8).

What affects SVM accuracy?

Different model parameters affect the prediction accuracy of SVM model differently. Training sample size can also influence the prediction accuracy of SVM model. The method of determining the optimal SVM regression model is summarized. Prediction accuracy of SVM model improves greatly by applying the method promoted.

How can you increase the accuracy of a machine learning classification model?

To improve performance, you could iterate through these steps: Collect data: Increase the number of training examples. Feature processing: Add more variables and better feature processing. Model parameter tuning: Consider alternate values for the training parameters used by your learning algorithm.


1 Answers

For SVM, it's important to have the same scaling for all features and normally it is done through scaling the values in each (column) feature such that the mean is 0 and variance is 1. Another way is to scale it such that the min and max are for example 0 and 1. However, there isn't any difference between [0, 1] and [0, 10]. Both will show the same performance.

If you insist on using SVM for classification, another way that may result in improvement is ensembling multiple SVM. In case you are using Python, you can try BaggingClassifier from sklearn.ensemble.

Also notice that you can't expect to get any performance from a real set of training data. I think 97% is a very good performance. It is possible that you overfit the data if you go higher than this.

like image 64
Mahsa.Ghasemi Avatar answered Sep 24 '22 17:09

Mahsa.Ghasemi