Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a transformer (estimator) to transform the target labels in sklearn.pipeline

Tags:

scikit-learn

I understand that one can chain several estimators that implement the transform method to transform X (the feature set) in sklearn.pipeline. However I have a use case where I would like also transform the target labels (like transform the labels to [1...K] instead of [0, K-1] and I would love to do that as a component in my pipeline. Is it possible to that at all using the sklearn.pipeline.?

like image 862
vkmv Avatar asked Sep 03 '13 22:09

vkmv


People also ask

What are the essential methods for transformer in scikit-learn?

For our transformer to work smoothly with Scikit-Learn, we should have three methods: fit() transform() fit_transform.

When should I use Sklearn ColumnTransformer?

Use the scikit-learn ColumnTransformer function to implement preprocessing functions such as MinMaxScaler and OneHotEncoder to numeric and categorical features simultaneously. Use ColumnTransformer to build all our transformations together into one object and use it with scikit-learn pipelines.

What does the Fit () method do?

The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning. Note that the model is fitted using X and y , but the object holds no reference to X and y .

What is a transformer in Sklearn?

In scikit-learn, Transformers are objects that transform a dataset into a new one to prepare the dataset for predictive modeling, e.g., scaling numeric values, one-hot encoding categoricals, etc.


1 Answers

There is now a nicer way to do this built into scikit-learn; using a compose.TransformedTargetRegressor.

When constructing these objects you give them a regressor and a transformer. When you .fit() them they transform the targets before regressing, and when you .predict() them they transform their predicted targets back to the original space.

It's important to note that you can pass them a pipeline object, so they should interface nicely with your existing setup. For example, take the following setup where I train a ridge regression to predict 1 target given 2 features:

# Imports
import numpy as np
from sklearn import compose, linear_model, metrics, pipeline, preprocessing

# Generate some training and test features and targets
X_train = np.random.rand(200).reshape(100,2)
y_train = 1.2*X_train[:, 0]+3.4*X_train[:, 1]+5.6
X_test = np.random.rand(20).reshape(10,2)
y_test = 1.2*X_test[:, 0]+3.4*X_test[:, 1]+5.6

# Define my model and scalers
ridge = linear_model.Ridge(alpha=1e-2)
scaler = preprocessing.StandardScaler()
minmax = preprocessing.MinMaxScaler(feature_range=(-1,1))

# Construct a pipeline using these methods
pipe = pipeline.make_pipeline(scaler, ridge)

# Construct a TransformedTargetRegressor using this pipeline
# ** So far the set-up has been standard **
regr = compose.TransformedTargetRegressor(regressor=pipe, transformer=minmax)

# Fit and train the regr like you would a pipeline
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print("MAE: {}".format(metrics.mean_absolute_error(y_test, y_pred)))

This still isn't quite as smooth as I'd like it to be, for example you can access the regressor that contained by a TransformedTargetRegressor using .regressor_ but the coefficients stored there are untransformed. This means there are some extra hoops to jump through if you want to work your way back to the equation that generated the data.

like image 191
Ari Cooper-Davis Avatar answered Sep 25 '22 16:09

Ari Cooper-Davis