Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

muti output regression in xgboost

Is it possible to train a model in Xgboost that have multiple continuous outputs (multi regression)? What would be the objective to train such a model?

Thanks in advance for any suggestions

like image 217
user1782011 Avatar asked Sep 16 '16 21:09

user1782011


People also ask

Does XGBoost support multi-output regression?

Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels.

What is multi-output regression?

Multi-output regression involves predicting two or more numerical variables. Unlike normal regression where a single value is predicted for each sample, multi-output regression requires specialized machine learning algorithms that support outputting multiple variables for each prediction.

Can random forest predict multiple output?

A random forest regressor is used, which supports multi-output regression natively, so the results can be compared. The random forest regressor will only ever predict values within the range of observations or closer to zero for each of the targets.

Is XGBoost used for regression?

XGBoost is a powerful approach for building supervised regression models.


2 Answers

My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor. MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict, which xgboost happens to support.

# get some noised linear data X = np.random.random((1000, 10)) a = np.random.random((10, 3)) y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))  # fitting multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)  # predicting print np.mean((multioutputregressor.predict(X) - y)**2, axis=0)  # 0.004, 0.003, 0.005 

This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn API originally).

However this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.

like image 194
ComeOnGetMe Avatar answered Sep 22 '22 12:09

ComeOnGetMe


It generates warnings: reg:linear is now deprecated in favor of reg:squarederror, so I update an answer based on @ComeOnGetMe's

import numpy as np  import pandas as pd  import xgboost as xgb from sklearn.multioutput import MultiOutputRegressor  # get some noised linear data X = np.random.random((1000, 10)) a = np.random.random((10, 3)) y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))  # fitting multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)  # predicting print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0)) 

Out:

[2.00592697e-05 1.50084441e-05 2.01412247e-05] 
like image 33
ah bon Avatar answered Sep 19 '22 12:09

ah bon