I'm trying to apply feature selection (e.g. recursive feature selection) in SVM, using the R package. I've installed Weka which supports feature selection in LibSVM but I haven't found any example for the syntax of SVM or anything similar. A short example would be of a great help.
Thus, the SVM model with feature selection becomes a mixed-integer model and therefore a non-convex model. Since the SVM problem without feature selection is a linear programming problem and thus a convex problem, its Pareto frontier can be obtained by varying the parameter .
SVMs are used in applications like handwriting recognition, intrusion detection, face detection, email classification, gene classification, and in web pages. This is one of the reasons we use SVMs in machine learning. It can handle both classification and regression on linear and non-linear data.
Forward Feature Selection using SVM The Forward feature selection technique works in a way wherein at first a single feature is selected from the dataset and later all the features are added to the feature selection instance and later this instance object can be used to evaluate the model parameters.
To use SVM in R, we have a package e1071. The package is not preinstalled, hence one needs to run the line “install. packages(“e1071”) to install the package and then import the package contents using the library command. The syntax of svm package is quite similar to linear regression.
The function rfe
in the caret
package performs recursive feature selection for various algorithms. Here's an example from the caret
documentation:
library(caret)
data(BloodBrain, package="caret")
x <- scale(bbbDescr[,-nearZeroVar(bbbDescr)])
x <- x[, -findCorrelation(cor(x), .8)]
x <- as.data.frame(x)
svmProfile <- rfe(x, logBBB,
sizes = c(2, 5, 10, 20),
rfeControl = rfeControl(functions = caretFuncs,
number = 200),
## pass options to train()
method = "svmRadial")
# Here's what your results look like (this can take some time)
> svmProfile
Recursive feature selection
Outer resampling method: Bootstrap (200 reps)
Resampling performance over subset size:
Variables RMSE Rsquared RMSESD RsquaredSD Selected
2 0.6106 0.4013 0.05581 0.08162
5 0.5689 0.4777 0.05305 0.07665
10 0.5510 0.5086 0.05253 0.07222
20 0.5203 0.5628 0.04892 0.06721
71 0.5202 0.5630 0.04911 0.06703 *
The top 5 variables (out of 71):
fpsa3, tcsa, prx, tcpa, most_positive_charge
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With