Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-output regression

I have been looking in to Multi-output regression the last view weeks. I am working with the scikit learn package. My machine learning problem has an a input of 3 features an needs to predict two output variables. Some ML models in the sklearn package support multioutput regression nativly. If the models do not support this, the sklearn multioutput regression algorithm can be used to convert it. The multioutput class fits one regressor per target.

  1. Does the mulioutput regressor class or supported multi-output regression algorithms take the underlying relationship of the input variables in to account?
  2. Instead of a multi-output regression algorithm should I use a Neural network?
like image 733
Matthijs Visser Avatar asked Mar 20 '18 18:03

Matthijs Visser


1 Answers

1) For your first question, I have divided that into two parts.

  • First part has the answer written in the documentation you linked and also in this user guide topic, which states explicitly that:

    As MultiOutputRegressor fits one regressor per target it can not take advantage of correlations between targets.

  • Second part of first question asks about other algorithms which support this. For that you can look at the "inherently multiclass" part in the user-guide. Inherently multi-class means that they don't use One-vs-Rest or One-vs-One strategy to be able to handle multi-class (OvO and OvR uses multiple models to fit multiple classes and so may not use the relationship between targets). Inherently multi-class means that they can structure the multi-class setting into a single model. This lists the following:

    sklearn.naive_bayes.BernoulliNB
    sklearn.tree.DecisionTreeClassifier
    sklearn.tree.ExtraTreeClassifier
    sklearn.ensemble.ExtraTreesClassifier
    sklearn.naive_bayes.GaussianNB
    sklearn.neighbors.KNeighborsClassifier
    sklearn.semi_supervised.LabelPropagation
    sklearn.semi_supervised.LabelSpreading
    sklearn.discriminant_analysis.LinearDiscriminantAnalysis
    sklearn.svm.LinearSVC (setting multi_class=”crammer_singer”)
    sklearn.linear_model.LogisticRegression (setting multi_class=”multinomial”)
    ...
    ...
    ...
    

    Try replacing the 'Classifier' at the end with 'Regressor' and see the documentation of fit() method there. For example let's take DecisionTreeRegressor.fit():

    y : array-like, shape = [n_samples] or [n_samples, n_outputs]
    
        The target values (real numbers). 
        Use dtype=np.float64 and order='C' for maximum efficiency.
    

    You see that it supports a 2-d array for targets (y). So it may be able to use correlation and underlying relationship of targets.

2) Now for your second question about using neural network or not, it depends on personal preference, the type of problem, the amount and type of data you have, the training iterations you want to do. Maybe you can try multiple algorithms and choose what gives best output for your data and problem.

like image 91
Vivek Kumar Avatar answered Oct 01 '22 19:10

Vivek Kumar