Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly merge outputs from models in the ensemble?

I am trying to figure out how to properly create regression ensembles. I know there are various options. I use the following approach. First I define models like Linear Regression, GBM, etc. Then I run GridSearchCV for each of these models to know best parameters. After this I want to make final prediction while considering individual predictions of each model. The question is how to properly merge individual predictions into a single Y vector? It seams that assigning weight coefficient to each prediction is not suitable for regression problem. If it is, then how to obtain such weight coefficients? Maybe the good way is to train a meta-model using individual predictions as a training set?

like image 283
Klausos Klausos Avatar asked Dec 10 '15 00:12

Klausos Klausos


1 Answers

Disclaimer: I have no personal experience with training ensembles, but I'm also interested in this topic.

  1. Most resources I have found on training ensembles deal with classification problems. A good article I have found, besides the wikipedia articles, is: http://mlwave.com/kaggle-ensembling-guide/ But for regression the article only lists averaging. It is possible to assign weight coefficients, e.g. based on cross validation performance, to each model and the prediction still makes sense: you just have to normalize the coefficients so that they sum up to 1.0.
  2. Another option is to do boosting: You train your models one after the other and each successive model is trained on the error of the previous model. Meaning, if the first model had a prediction which was too high, the next model will try to predict a negative value for that sample (so that the sum of the models' predictions equals the real training target). This short paragraph in wikipedia might help to understand it: https://en.wikipedia.org/wiki/Ensemble_learning#Boosting
  3. As far as I have read, bagging (bootstrap aggregating) also seems to work for regression. You train each model with only a random subset of the training data. Then for prediction you take the average (all models have the same weight). The details of how to sample the training data is described here.
  4. Stacking is what you have already suggested: Using a meta classifier with the output of the models as data. An explanation and details how to implement this can be found for example here: https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/ensembles/stacking.html

Also, a related question with more information on Cross Validated: https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning

like image 86
Robin Spiess Avatar answered Oct 31 '22 06:10

Robin Spiess