I am trying out to create a Random Forest regression model on one of my datasets. I need to find the order of importance of each variable along with their names as well. I have tried few things but can't achieve what I want. Below is the sample code I tried on Boston Housing dataset:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
boston = load_boston()
rf=RandomForestRegressor(max_depth=50)
idx=range(len(boston.target))
np.random.shuffle(idx)
rf.fit(boston.data[:500], boston.target[:500])
instance=boston.data[[0,5, 10]]
print rf.predict(instance[0])
print rf.predict(instance[1])
print rf.predict(instance[2])
important_features=[]
for x,i in enumerate(rf.feature_importances_):
important_features.append(str(x))
print 'Most important features:',', '.join(important_features)
Most important features: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
If I print this:
impor = rf.feature_importances_
impor
I get below output:
array([ 3.45665230e-02, 4.58687594e-04, 5.45376404e-03,
3.33388828e-04, 2.90936201e-02, 4.15908448e-01,
1.04131089e-02, 7.26451301e-02, 3.51628079e-03,
1.20860975e-02, 1.40417760e-02, 8.97546838e-03,
3.92507707e-01])
I need to get the names associated with these values and then pick the top n out of these features.
First, you are using wrong name for the variable. You are using important_features
. Use feature_importances_
instead. Second, it will return an array of shape [n_features,]
which contains the values of the feature_importance. You need to sort them in order of those values to get the most important features.
See the RandomForestRegressor documentation
Edit: Added code
important_features_dict = {}
for idx, val in enumerate(rf.feature_importances_):
important_features_dict[idx] = val
important_features_list = sorted(important_features_dict,
key=important_features_dict.get,
reverse=True)
print(f'5 most important features: {important_features_list[:5]}')
This will print the index of important features in decreasing order. (First is most important, and so on)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With