I am using decision trees from Scikit Learn to do regression on a data set.
I am getting very good results, but one issue that concerns me is that the relative uncertainty on many of the features is very high.
I have tried just dropping the cases with high uncertainty, but that reduces the performance of the model significantly.
The features themselves are experimentally determined, so they have associated experimental uncertainty. The data itself is not noisy.
So my question, is there a good way to incorporate the uncertainty associated with the features to machine learning algorithms?
Thanks for all the help!
If the uncertain features are improving the algorithm that suggests that together, they are useful. However, some of them may not be. My suggestion would be to get rid of those features that don't improve the algorithm. You could use a greedy feature elimination algorithm.
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
This begins by training a model on all the features in the model and then gets rid of the feature deemed to be the least useful. It trains the model again but with one less feature.
Hope that helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With