I am trying to get the most important features for my GaussianNB model. The codes from here How to get most informative features for scikit-learn classifiers? or here How to get most informative features for scikit-learn classifier for different class? only work when I use MultinomialNB. How can I calculate or retrieve the most important features for each of my two classes (Fault = 1 or Fault = 0) otherwise? My code is: (not applied to text data)
df = df.toPandas()
X = X_df.values
Y = df['FAULT'].values.reshape(-1,1)
gnb = GaussianNB()
y_pred = gnb.fit(X, Y).predict(X)
print(confusion_matrix(Y, y_pred))
print(accuracy_score(Y, y_pred))
Where X_df is a dataframe with binary columns for each of my features.
This is how I tried to understand the important features of the Gaussian NB. SKlearn Gaussian NB models, contains the params theta and sigma which is the variance and mean of each feature per class (For ex: If it is binary classification problem, then model.sigma_ would return two array and mean value of each feature per class).
neg = model.theta_[0].argsort()
print(np.take(count_vect.get_feature_names(), neg[:10]))
print('')
neg = model.sigma_[0].argsort()
print(np.take(count_vect.get_feature_names(), neg[:10]))
This is how I tried to get the important features of the class using the Gaussian Naive Bayes in scikit-learn library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With