Using a transformer (estimator) to transform the target labels in sklearn.pipeline

Tags:

scikit-learn

I understand that one can chain several estimators that implement the transform method to transform X (the feature set) in sklearn.pipeline. However I have a use case where I would like also transform the target labels (like transform the labels to [1...K] instead of [0, K-1] and I would love to do that as a component in my pipeline. Is it possible to that at all using the sklearn.pipeline.?

862

asked Sep 03 '13 22:09

vkmv

1 Answers

There is now a nicer way to do this built into scikit-learn; using a compose.TransformedTargetRegressor.

When constructing these objects you give them a regressor and a transformer. When you .fit() them they transform the targets before regressing, and when you .predict() them they transform their predicted targets back to the original space.

It's important to note that you can pass them a pipeline object, so they should interface nicely with your existing setup. For example, take the following setup where I train a ridge regression to predict 1 target given 2 features:

# Imports
import numpy as np
from sklearn import compose, linear_model, metrics, pipeline, preprocessing

# Generate some training and test features and targets
X_train = np.random.rand(200).reshape(100,2)
y_train = 1.2*X_train[:, 0]+3.4*X_train[:, 1]+5.6
X_test = np.random.rand(20).reshape(10,2)
y_test = 1.2*X_test[:, 0]+3.4*X_test[:, 1]+5.6

# Define my model and scalers
ridge = linear_model.Ridge(alpha=1e-2)
scaler = preprocessing.StandardScaler()
minmax = preprocessing.MinMaxScaler(feature_range=(-1,1))

# Construct a pipeline using these methods
pipe = pipeline.make_pipeline(scaler, ridge)

# Construct a TransformedTargetRegressor using this pipeline
# ** So far the set-up has been standard **
regr = compose.TransformedTargetRegressor(regressor=pipe, transformer=minmax)

# Fit and train the regr like you would a pipeline
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print("MAE: {}".format(metrics.mean_absolute_error(y_test, y_pred)))

This still isn't quite as smooth as I'd like it to be, for example you can access the regressor that contained by a TransformedTargetRegressor using .regressor_ but the coefficients stored there are untransformed. This means there are some extra hoops to jump through if you want to work your way back to the equation that generated the data.

191

answered Sep 25 '22 16:09

Ari Cooper-Davis

Related questions
                            
                                XGBOOST: sample_Weights vs scale_pos_weight
                            
                                displaying scikit decision tree figure in jupyter notebook
                            
                                How should I vectorize the following list of lists with scikit learn?
                            
                                Can the Precision, Recall and F1 be the same value?
                            
                                How does parameters 'c' and 'cmap' behave in a matplotlib scatter plot?
                            
                                How to use mahalanobis distance in sklearn DistanceMetrics?
                            
                                Understanding Text feature extraction TfidfVectorizer in python scikit-learn
                            
                                KL-Divergence of two GMMs
                            
                                ImportError: cannot import name cross_validation
                            
                                CountVectorizer: "I" not showing up in vectorized text
                            
                                How do i visualize data points of tf-idf vectors for kmeans clustering?
                            
                                XGboost python - classifier class weight option?
                            
                                Pandas and scikit-learn: KeyError: [....] not in index
                            
                                Sklearn: Cross validation for grouped data
                            
                                How to do discretization of continuous attributes in sklearn?
                            
                                Explicitly specifying test/train sets in GridSearchCV
                            
                                How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?
                            
                                How to output RandomForest Classifier from python?
                            
                                Print Estimator Name in SkLearn
                            
                                Removing features with low variance using scikit-learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With