Is it possible to train a model in Xgboost that have multiple continuous outputs (multi regression)? What would be the objective to train such a model?
Thanks in advance for any suggestions
Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels.
Multi-output regression involves predicting two or more numerical variables. Unlike normal regression where a single value is predicted for each sample, multi-output regression requires specialized machine learning algorithms that support outputting multiple variables for each prediction.
A random forest regressor is used, which supports multi-output regression natively, so the results can be compared. The random forest regressor will only ever predict values within the range of observations or closer to zero for each of the targets.
XGBoost is a powerful approach for building supervised regression models.
My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor
. MultiOutputRegressor
trains one regressor per target and only requires that the regressor implements fit
and predict
, which xgboost happens to support.
# get some noised linear data X = np.random.random((1000, 10)) a = np.random.random((10, 3)) y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3)) # fitting multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y) # predicting print np.mean((multioutputregressor.predict(X) - y)**2, axis=0) # 0.004, 0.003, 0.005
This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn
API originally).
However this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.
It generates warnings: reg:linear is now deprecated in favor of reg:squarederror
, so I update an answer based on @ComeOnGetMe's
import numpy as np import pandas as pd import xgboost as xgb from sklearn.multioutput import MultiOutputRegressor # get some noised linear data X = np.random.random((1000, 10)) a = np.random.random((10, 3)) y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3)) # fitting multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y) # predicting print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))
Out:
[2.00592697e-05 1.50084441e-05 2.01412247e-05]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With