I am trying to figure out how to properly create regression ensembles. I know there are various options. I use the following approach.
First I define models like Linear Regression, GBM, etc. Then I run GridSearchCV for each of these models to know best parameters. After this I want to make final prediction while considering individual predictions of each model.
The question is how to properly merge individual predictions into a single Y vector? It seams that assigning weight coefficient to each prediction is not suitable for regression problem. If it is, then how to obtain such weight coefficients?
Maybe the good way is to train a meta-model using individual predictions as a training set?
Disclaimer: I have no personal experience with training ensembles, but I'm also interested in this topic.
- Most resources I have found on training ensembles deal with classification problems. A good article I have found, besides the wikipedia articles, is: http://mlwave.com/kaggle-ensembling-guide/ But for regression the article only lists averaging. It is possible to assign weight coefficients, e.g. based on cross validation performance, to each model and the prediction still makes sense: you just have to normalize the coefficients so that they sum up to 1.0.
- Another option is to do boosting: You train your models one
after the other and each successive model is trained on the error of
the previous model. Meaning, if the first model had a prediction
which was too high, the next model will try to predict a negative
value for that sample (so that the sum of the models' predictions
equals the real training target). This short paragraph in wikipedia
might help to understand it:
https://en.wikipedia.org/wiki/Ensemble_learning#Boosting
- As far as I have read, bagging (bootstrap aggregating) also
seems to work for regression. You train each model with only a
random subset of the training data. Then for prediction you take the
average (all models have the same weight). The details of how to sample the training data is described here.
-
Stacking is what you have already suggested: Using a meta classifier with the output of the models as data. An explanation and
details how to implement this can be found for example here:
https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/ensembles/stacking.html
Also, a related question with more information on Cross Validated: https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning