Say, I am working on a machine learning model in R using naive bayes. So I would build a model using the naiveBayes package as follows
model <- naiveBayes(Class ~ ., data = HouseVotes84)
I can also print out the weights of the model by just printing the model.
And I do the prediction as follows, and this gives me one of the classes as the prediction
predict(model, HouseVotes84[1:10,], type = "raw")
However, my question is, is there a way to see which of the columns affected this prediction the most? So, I can get to know what are the most important contributing factors to a student failing the class, say, if that was the response variable, and the various possible factors were the other predictor columns.
My question is for any package in R, naiveBayes above is just an example.
The answer depends on how you want to do the feature selection.
If it is part of the model building process and not some post-hoc analysis you could use caret
with its feature selection wrapper methods to determine the best subset of features to model with recursive feature elmination, genetic algorithms etc, or filtering using univariate analysis.
If it is part of your post-hoc analysis based solely on your prediction. Then it depends on the type of model you have used. caret
also supports this functionality for compatible models only!
For svm
, with the exception of linear kernels, determining the importance of the coefficients is highly non-trivial. I'm unaware of any attempt to try to do some kind of feature ranking for svm
in general regardless of language (please tell me if it does exist!!).
With rpart
(as its tagged in the question) you can just visually look at the nodes. The higher the node the more important it is. This can be done in the caret
package:
library(rpart)
library(caret)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
caret::varImp(fit)
# Overall
#Age 5.896114
#Number 3.411081
#Start 8.865279
With naiveBayes
you can see it from your model output. You just have to stare really hard:
data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
model
#
#Naive Bayes Classifier for Discrete Predictors
#
#Call:
#naiveBayes.default(x = X, y = Y, laplace = laplace)
#
#A-priori probabilities:
#Y
# democrat republican
# 0.6137931 0.3862069
#
#Conditional probabilities:
# V1
#Y n y
# democrat 0.3953488 0.6046512
# republican 0.8121212 0.1878788
#
# V2
#Y n y
# democrat 0.4979079 0.5020921
# republican 0.4932432 0.5067568
A very brief glance shows that at least V1
looks like a better variable than V2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With