Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to change feature weight when training a model with sklearn?

I want to classifier text by using sklearn. first I used bag of words to training the data, the feature of bag of words are really large, more than 10000 features, so I reduced this feature by using SVD to 100.

But here I want to add some other features like # of words, # of positive words, # of pronouns etc. the additional features are only 10 less features, which compare to the 100 of bag of words feature are really small

From this situation I raise 2 questions:

  1. Is there some function in sklearn that can change the additional features' weight to make them more important?
  2. How do I check the additional feature is important to classifier?
like image 274
HAO CHEN Avatar asked Nov 28 '15 15:11

HAO CHEN


Video Answer


1 Answers

Although very much interest, I don't know the answer for the main question. In the meanwhile I can help with the second one.

After fitting a model you can access the feature importance through the attribute model.feature_importances_

I use the following function to normalize the importance and show it in a prettier way.

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # (optional)

def showFeatureImportance(model):
    #FEATURE IMPORTANCE
    # Get Feature Importance from the classifier
    feature_importance = model.feature_importances_

    # Normalize The Features
    feature_importance = 100.0 * (feature_importance / Feature_importance.max())
    sorted_idx = np.argsort(feature_importance)
    pos = np.arange(sorted_idx.shape[0]) + .5

    #plot relative feature importance
    plt.figure(figsize=(12, 12))
    plt.barh(pos, feature_importance[sorted_idx], align='center', color='#7A68A6')
    plt.yticks(pos, np.asanyarray(X_cols)[sorted_idx])
    plt.xlabel('Relative Importance')
    plt.title('Feature Importance')
    plt.show()
like image 79
fernandosjp Avatar answered Oct 28 '22 01:10

fernandosjp