How to get feature Importance in naive bayes?

Tags:

I have a dataset of reviews which has a class label of positive/negative. I am applying Naive Bayes to that reviews dataset. Firstly, I am converting into Bag of words. Here sorted_data['Text'] is reviews and final_counts is a sparse matrix

count_vect = CountVectorizer() 
final_counts = count_vect.fit_transform(sorted_data['Text'].values)

I am splitting the data into train and test dataset.

X_1, X_test, y_1, y_test = cross_validation.train_test_split(final_counts, labels, test_size=0.3, random_state=0)

I am applying the naive bayes algorithm as follows

optimal_alpha = 1
NB_optimal = BernoulliNB(alpha=optimal_aplha)

# fitting the model
NB_optimal.fit(X_tr, y_tr)

# predict the response
pred = NB_optimal.predict(X_test)

# evaluate accuracy
acc = accuracy_score(y_test, pred) * 100
print('\nThe accuracy of the NB classifier for k = %d is %f%%' % (optimal_aplha, acc))

Here X_test is test dataset in which pred variable gives us whether the vector in X_test is positive or negative class.

The X_test shape is (54626 rows, 82343 dimensions)

length of pred is 54626

My question is I want to get the words with highest probability in each vector so that I can get to know by the words that why it predicted as positive or negative class. Therefore, how to get the words which have highest probability in each vector?

503

asked May 25 '18 10:05

merkle

2 Answers

You can get the important of each word out of the fit model by using the coefs_ or feature_log_prob_ attributes. For example

neg_class_prob_sorted = NB_optimal.feature_log_prob_[0, :].argsort()[::-1]
pos_class_prob_sorted = NB_optimal.feature_log_prob_[1, :].argsort()[::-1]

print(np.take(count_vect.get_feature_names(), neg_class_prob_sorted[:10]))
print(np.take(count_vect.get_feature_names(), pos_class_prob_sorted[:10]))

Prints the top 10 most predictive words for each of your classes.

165

answered Sep 28 '22 17:09

piman314

def get_salient_words(nb_clf, vect, class_ind):
    """Return salient words for given class
    Parameters
    ----------
    nb_clf : a Naive Bayes classifier (e.g. MultinomialNB, BernoulliNB)
    vect : CountVectorizer
    class_ind : int
    Returns
    -------
    list
        a sorted list of (word, log prob) sorted by log probability in descending order.
    """

    words = vect.get_feature_names()
    zipped = list(zip(words, nb_clf.feature_log_prob_[class_ind]))
    sorted_zip = sorted(zipped, key=lambda t: t[1], reverse=True)

    return sorted_zip

neg_salient_top_20 = get_salient_words(NB_optimal, count_vect, 0)[:20]
pos_salient_top_20 = get_salient_words(NB_optimal, count_vect, 1)[:20]

answered Sep 28 '22 17:09

dimid

Related questions
                            
                                Exception in Boto3 - botocore.exceptions.EndpointConnectionError
                            
                                Pandas: remove group from the data when a value in the group meets a required condition
                            
                                How to get every nth column in pandas?
                            
                                Install pip on OS X [closed]
                            
                                Python Element Tree Writing to New File
                            
                                Convert comma separated string to array in pyspark dataframe
                            
                                How to calculate a partial Area Under the Curve (AUC)
                            
                                selenium.common.exceptions.WebDriverException: Message: connection refused
                            
                                Copy a list of list by value and not reference [duplicate]
                            
                                Git push via GitPython
                            
                                Relocation R_X86_64_32S against '_Py_NotImplementedStruct' can not be used when making a shared object; recompile with -fPIC
                            
                                Using 'while' loops in a list comprehension
                            
                                Python Pandas Copy Columns
                            
                                How to hide lines in matplotlib? [duplicate]
                            
                                Unexpected result from `in` operator - Python [duplicate]
                            
                                Slicing multiple column ranges from a dataframe using iloc
                            
                                Python package - aiohttp has a warning message "Unclosed client session"
                            
                                Can pip install from setup.cfg, as if installing from a requirements file?
                            
                                Is there a way to use `json.dump` with `gzip`?
                            
                                How to remove Python 3.6 completely from Ubuntu 18.04

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get feature Importance in naive bayes?

Tags:

python

python-3.x

machine-learning

naivebayes

scikit-learn

merkle

People also ask

2 Answers

piman314

dimid

Recent Activity

Donate For Us