I want to classifier text by using sklearn. first I used bag of words to training the data, the feature of bag of words are really large, more than 10000 features, so I reduced this feature by using SVD to 100.
But here I want to add some other features like # of words, # of positive words, # of pronouns etc. the additional features are only 10 less features, which compare to the 100 of bag of words feature are really small
From this situation I raise 2 questions:
Although very much interest, I don't know the answer for the main question. In the meanwhile I can help with the second one.
After fitting a model you can access the feature importance through the attribute model.feature_importances_
I use the following function to normalize the importance and show it in a prettier way.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # (optional)
def showFeatureImportance(model):
#FEATURE IMPORTANCE
# Get Feature Importance from the classifier
feature_importance = model.feature_importances_
# Normalize The Features
feature_importance = 100.0 * (feature_importance / Feature_importance.max())
sorted_idx = np.argsort(feature_importance)
pos = np.arange(sorted_idx.shape[0]) + .5
#plot relative feature importance
plt.figure(figsize=(12, 12))
plt.barh(pos, feature_importance[sorted_idx], align='center', color='#7A68A6')
plt.yticks(pos, np.asanyarray(X_cols)[sorted_idx])
plt.xlabel('Relative Importance')
plt.title('Feature Importance')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With