Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

interpert random forest model for text classificaiton

I have text dataset in which I have manually classified each record as either one of two possible classes. I created a TFIDF on the corpus, sans English stopwords, trained/tested a Random Forest classifier, evaluated the model, and applied the model to a larger corpus of text. All is good so far, but how to find out more about my model, i.e., how can find out about which words are "important" the model?

like image 866
user1624577 Avatar asked Nov 17 '25 08:11

user1624577


1 Answers

The trained RF should have an attribute feature_importances_. I think you have to train the model with oob_score=True (in the constructor). The feature importances will tell you which features (data matrix columns) are influential. To get the words, you go back to the tfidf vectorizer and get its vocabulary_ attribute (note the trailing underscore), which is a dict from words to column indices.

For an explanation of the vocabulary_ attribute, see this post: sklearn : TFIDF Transformer : How to get tf-idf values of given words in document

like image 85
Dthal Avatar answered Nov 19 '25 00:11

Dthal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!