Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regression with multi-dimensional targets

I am using scikit-learn to do regression and my problem is the following. I need to do regression on several parameters (vectors). This works fine with some regression approaches such as ensemble.ExtraTreesRegressor and ensemble.RandomForestRegressor. Indeed, one can give a vector of vectors as targets to fit the model (fit(X,y) method) for the two aforementionned regression methods.

However when I try with ensemble.GradientBoostingRegressor, ensemble.AdaBoostRegressor and linear_model.SGDRegressor, the classifier fails to fit the model because it expects 1-dimensional values as targets (y argument of the fit(X,y) method). This means, with those Regression methods I can estimate only one parameter at a time. This is not suitable for my problem because it might take some time while I need to estimate about 20 parameters. On the other hand, I really would like to test those approaches.

So, my question is: Does anyone know if there is a solution to fit the model once and estimate several parameters for ensemble.GradientBoostingRegressor, ensemble.AdaBoostRegressor and linear_model.SGDRegressor?

I hope I've been clear enough ...

like image 308
user1120168 Avatar asked Feb 04 '14 15:02

user1120168


People also ask

What is multi target regression?

Introduction. Multi-target regression (MTR) [1], also known as multivariable or multioutput regression, refers to the task of simultaneously predicting multiple continuous variables given a common set of input features. It finds applications in a broad range of real-world tasks.

What is an example of multivariate regression?

If E-commerce Company has collected the data of its customers such as Age, purchased history of a customer, gender and company want to find the relationship between these different dependents and independent variables.

Can you do regression with multiple variables?

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. Multiple regression is an extension of linear (OLS) regression that uses just one explanatory variable.

What are the three types of multiple regression Analyses?

There are several types of multiple regression analyses (e.g. standard, hierarchical, setwise, stepwise) only two of which will be presented here (standard and stepwise). Which type of analysis is conducted depends on the question of interest to the researcher.


2 Answers

I interpret that what you have is a problem of multiple multivariate regression.

Not every regression method in scikit-learn can handle this sort of problem and you should consult the documentation of each one to find it out. In particular, neither SGDRegressor, GradientBoostingRegressor nor AdaBoostRegressor support this at the moment: fit(X, y) specifies X : array-like, shape = [n_samples, n_features] and y: array-like, shape = [n_samples].

However, you can use other methods in scikit-learn. For example, linear models:

from sklearn import linear_model
# multivariate input
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
# univariate output
Y = [0., 1., 2., 3.]
# multivariate output
Z = [[0., 1.], [1., 2.], [2., 3.], [3., 4.]]

# ordinary least squares
clf = linear_model.LinearRegression()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])

# Ridge
clf = linear_model.BayesianRidge()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])

# Lasso
clf = linear_model.Lasso()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])
like image 129
emiguevara Avatar answered Oct 06 '22 01:10

emiguevara


As already mentioned, only some models support multivariate output. If you want to use one of the others, you can use a new class for parallelization of regressors for multivariate output: MultiOutputRegressor.

You can use it like this:

from sklearn.datasets import load_linnerud
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

linnerud = load_linnerud()

X = linnerud.data
Y = linnerud.target

# to set number of jobs to the number of cores, use n_jobs=-1
MultiOutputRegressor(GradientBoostingRegressor(), n_jobs=-1).fit(X, Y)
like image 21
Jonathan Striebel Avatar answered Oct 06 '22 00:10

Jonathan Striebel