Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally?
All the implementations I've found require you to provide all the features to the classifier at once in order to get a classification. However, in my application, I have hundreds of features, and some of the features are functions that can take a long time to evaluate. Since not all branches of the tree may use all the features, it doesn't make sense to give the classifier all the features at once. I'd like the classifier to ask for features, one at a time, in the order that they are needed to make the maximum reduction in entropy and provide a final classification.
I believe there is no such implementation, but decision trees are so simple to implement that you shouldn't have any problems with writing such program on your own.
On the other hand I don't think the idea of counting features on the fly can increase speed, because even if some feature was used to make some previous split, it still must be considered on the rest of them, so for many records it would be recalculated many times (it may save memory though).
This could make sense in case of random forest, where only a random, limited subset of features is considered on each split -- still RF is usable only as a classifier, it won't build you nice, human-interpretable decision trees.
Usually such packages (J48 trees in Weka in particular) allows you to specify a missing value in place of a feature value, which will be dealt with the same way C4.5 algorithm would do:
when we reach the node splitting by that attribute with missing value, we send the instance down each possible branch weighted proportional to the number of training instances going down those branches, eventually accumulating the result at the leaf nodes.
Of course you could have a more aggressive approach and change the way the tree chooses the attributes to split by during the training phase. A naive way would be assign weights to attributes and to multiply the splitting criterion (entropy, information gain, ..) by that weight as a penalty coefficient, so that "expensive attributes" would be less likely to be chosen as a splitting node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With