Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does scikit-learn's DecisionTreeRegressor do true multi-output regression?

I have run in to a ML problem that requires us to use a multi-dimensional Y. Right now we are training independent models on each dimension of this output, which does not take advantage of additional information from the fact outputs are correlated.

I have been reading this to learn more about the few ML algorithms which have been truly extended to handle multidimensional outputs. Decision Trees are one of them.

Does scikit-learn use "Multi-target regression trees" in the event fit(X,Y) is given a multidimensional Y, or does it fit a separate tree for each dimension? I spent some time looking at the code but didn't figure it out.

like image 241
Pavel Komarov Avatar asked Oct 12 '25 23:10

Pavel Komarov


1 Answers

After more digging, the only difference between a tree given points labeled with a single-dimensional Y versus one given points with multi-dimensional labels is in the Criterion object it uses to decide splits. A Criterion can handle multi-dimensional labels, so the result of fitting a DecisionTreeRegressor will be a single regression tree regardless of the dimension of Y.

This implies that, yes, scikit-learn does use true multi-target regression trees, which can leverage correlated outputs to positive effect.

like image 66
Pavel Komarov Avatar answered Oct 14 '25 19:10

Pavel Komarov