Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I output the regression prediction from each tree in a Random Forest in Python scikit-learn?

Is there is a way to get the predictions from every tree in a random forest in addition to the combined prediction? I would like to output all of the predictions in a list and not view the entire tree. I know that I can get the leaf indices using the apply method, but I'm not sure how to use that to get the value from the leaf.

Edit: Here's what I have so far from comments below. It wasn't clear to me before that the trees in the estimators_ attribute could be called, but it seems that the predict method can be used on each tree using that attribute. Is this the best way to do this, though?

numberTrees = 100
clf = RandomForestRegressor(n_estimators=numberTrees)
clf.fit(X,Y)
for tree in range(numberTrees):
    print(clf.estimators_[tree].predict(val.irow(1)))
like image 922
chunky Avatar asked Dec 16 '13 16:12

chunky


2 Answers

I'm pretty sure that what you have up there is about the best you can do. As you noted, predict() returns the prediction for the whole RF, but not for its component trees. It can return a matrix, but that's only for the case where there are multiple targets being learned together. In that case it returns one prediction per target, it doesn't return predictions for each tree. You can get the individual tree predictions in R's random forest using predict.all = True, but sklearn doesn't have that. If you tried using apply(), you'd get a matrix of leaf indices, and then you'd still have to iterate over the trees to find out what the prediction for that tree/leaf combination was. So I think what you have is about as good as it gets.

like image 75
Dthal Avatar answered Oct 16 '22 19:10

Dthal


I had the same issue and I don't know how you got the right answer by using print(clf.estimators_[tree].predict(val.irow(1))). It gave me random numbers instead of the actual class. After reading the source code in SKlearn, I realized that we actually have to use predict_proba() instead of predict in the code and it gives you the class that the tree predicts according to the order in clf.classes_. For example:

tree_num = 2
tree_pred = clf.estimators_[tree_num].predict_proba(data_test)
print clf.classes_  #gives you the order of the classes
print tree_pred  #gives you an array of 0 with the predicted class as 1
>>> ['class1','class2','class3']
>>> [0, 1, 0]

You can also use cls.predict_proba() on your data and it gives you the probability of each class prediction by the accumulation of trees and releases you from the pain of going through each tree yourself:

x = clf.predict_proba(data_test) # assume data_test has two instances
print rfc.classes_
print x
>>> ['class1', 'class2', 'class3']
>>> [[0.12 ,  0.02,  0.86], # probabilities for the first instance
     [0.35 ,  0.01,  0.64]]  # for the second instance
like image 45
pegah Avatar answered Oct 16 '22 20:10

pegah