Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Support Vector Machine vs K Nearest Neighbours

I have a data set to classify.By using KNN algo i am getting an accuracy of 90% but whereas by using SVM i just able to get over 70%. Is SVM not better than KNN. I know this might be stupid to ask but, what are the parameters for SVM which will give nearly approximate results as KNN algo. I am using libsvm package on matlab R2008

like image 888
Mohit Jain Avatar asked Oct 17 '13 08:10

Mohit Jain


3 Answers

kNN and SVM represent different approaches to learning. Each approach implies different model for the underlying data.

SVM assumes there exist a hyper-plane seperating the data points (quite a restrictive assumption), while kNN attempts to approximate the underlying distribution of the data in a non-parametric fashion (crude approximation of parsen-window estimator).

You'll have to look at the specifics of your scenario to make a better decision as to what algorithm and configuration are best used.

like image 54
Shai Avatar answered Sep 18 '22 11:09

Shai


It really depends on the dataset you are using. If you have something like the first line of this image ( http://scikit-learn.org/stable/_images/plot_classifier_comparison_1.png ) kNN will work really well and Linear SVM really badly.

If you want SVM to perform better you can use a Kernel based SVM like the one in the picture (it uses a rbf kernel).

If you are using scikit-learn for python you can play a bit with code here to see how to use the Kernel SVM http://scikit-learn.org/stable/modules/svm.html

like image 44
AdrienNK Avatar answered Sep 19 '22 11:09

AdrienNK


kNN basically says "if you're close to coordinate x, then the classification will be similar to observed outcomes at x." In SVM, a close analog would be using a high-dimensional kernel with a "small" bandwidth parameter, since this will cause SVM to overfit more. That is, SVM will be closer to "if you're close to coordinate x, then the classification will be similar to those observed at x."

I recommend that you start with a Gaussian kernel and check the results for different parameters. From my own experience (which is, of course, focused on certain types of datasets, so your mileage may vary), tuned SVM outperforms tuned kNN.

Questions for you:

1) How are you selecting k in kNN?

2) What parameters have you tried for SVM?

3) Are you measuring accuracy in-sample or out-of-sample?

like image 38
Max Avatar answered Sep 22 '22 11:09

Max