Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most important features Gaussian Naive Bayes classifier python sklearn

I am trying to get the most important features for my GaussianNB model. The codes from here How to get most informative features for scikit-learn classifiers? or here How to get most informative features for scikit-learn classifier for different class? only work when I use MultinomialNB. How can I calculate or retrieve the most important features for each of my two classes (Fault = 1 or Fault = 0) otherwise? My code is: (not applied to text data)

df = df.toPandas()

X = X_df.values
Y = df['FAULT'].values.reshape(-1,1)


gnb = GaussianNB() 
y_pred = gnb.fit(X, Y).predict(X)

print(confusion_matrix(Y, y_pred))
print(accuracy_score(Y, y_pred))

Where X_df is a dataframe with binary columns for each of my features.

like image 741
LN_P Avatar asked Nov 16 '22 20:11

LN_P


1 Answers

This is how I tried to understand the important features of the Gaussian NB. SKlearn Gaussian NB models, contains the params theta and sigma which is the variance and mean of each feature per class (For ex: If it is binary classification problem, then model.sigma_ would return two array and mean value of each feature per class).

neg = model.theta_[0].argsort()
print(np.take(count_vect.get_feature_names(), neg[:10]))

print('')

neg = model.sigma_[0].argsort()
print(np.take(count_vect.get_feature_names(), neg[:10]))

This is how I tried to get the important features of the class using the Gaussian Naive Bayes in scikit-learn library.

like image 191
Rajesh Somasundaram Avatar answered Nov 19 '22 08:11

Rajesh Somasundaram