If I have a trained random forest, is there any way for me to obtain the number of votes that each class got by the forest on a test sample? A percentage of votes would be even better.
Something like CVRTrees::predict, but getting raw output along with the predicted class.
Thanks
Edit To further explain my goal, so that I could potentially get an answer that solves my problem and not necessarily my question.
To answer how much I know, it is very little.
This is a real world application, and I am trying to get myself up to speed as quickly as possible on all of this.
Essentially, I researching discriminative classifier, with a requirement that I need to be able to compare the output between 2 (or more) independent classifiers. I mean independent in the sense that they may or may not know about the entire set of classes, however there does exist a set of classes where all classifiers contain a subset of such classes.
My initial though is to collect meta-information about the classification from each of the classifiers, which ideally would contain some form of (there is a 15% of it being A, and 78% chance of being B) [I know chance is a bad word, but I will leave it]. If I could get that output, I would be able to perform a final classification based on dynamic performance weights assigned to each classifier.
The idea is that I can use a very simple rule based classifier to do initial classifications while the more exotic classifier has time to train. Ideally, the learning classifier could potentially support more classes than the rule classifier and over time it is primarily used.
I was dealing with the same issue and I would like to share my solution here. I derived a class from CvRTrees and added a function that has the wanted behavior. I used the existing predict() function as my starting point. Here is my code:
class CvRTreesMultiClass : public CvRTrees
{
public:
int predict_multi_class( const CvMat* sample,
cv::AutoBuffer<int>& out_votes,
const CvMat* missing = 0) const;
};
with:
int CvRTreesMultiClass::predict_multi_class( const CvMat* sample,
cv::AutoBuffer<int>& out_votes,
const CvMat* missing ) const
{
int result = 0;
int k;
if( nclasses > 0 ) //classification
{
int max_nvotes = 0;
int* votes = out_votes;
memset( votes, 0, sizeof(*votes)*nclasses );
for( k = 0; k < ntrees; k++ )
{
CvDTreeNode* predicted_node = trees[k]->predict( sample, missing );
int nvotes;
int class_idx = predicted_node->class_idx;
CV_Assert( 0 <= class_idx && class_idx < nclasses );
nvotes = ++votes[class_idx];
}
result = ntrees;
}
else // regression
{
throw std::runtime_error(__FUNCTION__ "can only be used classification");
}
return result;
}
After calling this function I simply calculate the probabilities from the number of votes that each class received (prob = out_votes[class_index] / result). I think this is what the OP was looking for (at least I was).
Are you doing binary classification? If yes, you can use CvRTrees::predict_prob(). That should return a value between 0 to 1 which is the proportion of trees deciding the given point belongs to the second class.
If you have more than two classes, then proportion of the trees classifying the given point to a particular class is not really a good indicator of the confidence. A better approach is to use CvRTrees::get_proximity(). The way it can be used depends on your application. Say you have a point from each class that with high probability belong to their classes. Then you first classify a given point. And to check the quality of the classification you can measure the proportion of trees voting for both the given point and the point that with high probability belongs to that class, by using get_proximity.
Your question is really limited, and it's unclear how much you know about measuring the confidence of discriminative classifiers. But there is much more into this if you're working on a real-world and serious project. If it's just a homework or exercise then perhaps that suffices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With