interpert random forest model for text classificaiton

Question

I have text dataset in which I have manually classified each record as either one of two possible classes. I created a TFIDF on the corpus, sans English stopwords, trained/tested a Random Forest classifier, evaluated the model, and applied the model to a larger corpus of text. All is good so far, but how to find out more about my model, i.e., how can find out about which words are "important" the model?

Dthal · Accepted Answer

The trained RF should have an attribute feature_importances_. I think you have to train the model with oob_score=True (in the constructor). The feature importances will tell you which features (data matrix columns) are influential. To get the words, you go back to the tfidf vectorizer and get its vocabulary_ attribute (note the trailing underscore), which is a dict from words to column indices.

For an explanation of the vocabulary_ attribute, see this post: sklearn : TFIDF Transformer : How to get tf-idf values of given words in document

interpert random forest model for text classificaiton

Tags:

python

python-2.7

nltk

scikit-learn

user1624577

1 Answers

Dthal

Recent Activity

Donate For Us

interpert random forest model for text classificaiton

Tags:

python

python-2.7

nltk

scikit-learn

user1624577

1 Answers

Dthal

Related questions

Recent Activity

Donate For Us