I have built a SVM linear model for two types of classes (1 and 0), using the following code:
class1.svm.model <- svm(Class ~ ., data = training,cost=1,cross=10, metric="ROC",type="C-classification",kernel="linear",na.action=na.omit,probability = TRUE)
and I have extracted the weights for the training set using the following code:
#extract the weights and constant from the SVM model:
w <- t(class1.svm.model$coefs) %*% class1.svm.model$SV;
b <- -1 * class1.svm.model$rho; #(sometimes called w0)
I get weights for each feature like the following example:
X2 0.001710949
X3 -0.002717934
X4 -0.001118897
X5 0.009280056
X993 -0.000256577
X1118 0
X1452 0.004280963
X2673 0.002971335
X4013 -0.004369505
Now how do I perform feature selection based on the weights extracted for each feature? how shall I build a weight matrix?
I read papers but the concept is yet not clear to me, Please help!
I've dashed this answer off rather quickly, so I expect there will be quite a few points that others can expand on, but as something to get you started...
There are a number of ways of doing this, but the first thing to tackle is to convert the linear weights into a measure of how important each feature is to the classification. This is a relatively simple three step process:
Optionally you can generate a more robust measure of feature importance by repeating the above several times on different sets of training data which you have created by randomly re-sampling your original training data.
Now that you have a way to determine how important each feature is to the classification, you can use this in a number of different ways to select which features to include in your final model. I will give an example of Recursive Feature Elimination, since it is one of my favourites, but you may want to look into iterative feature selection, or noise perturbation.
So, to perform recursive feature elimination:
[1] where a small enough set of features is determined by the point at which the accuracy begins to suffer when you apply your model to a validation set. On which note: when doing this sort of method of feature selection, make sure that you have not only a separate training and test set, but also a validation set for use in choosing how many features to keep.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With