I'm using sklearn pipelines to build a Keras autoencoder model and use gridsearch to find the best hyperparameters. This works fine if I use a Multilayer Perceptron model for classification; however, in the autoencoder I need the output values to be the same as input. In other words, I am using a StandardScalar
instance in the pipeline to scale the input values and therefore this leads to my question: how can I make the StandardScalar
instance inside the pipeline to work on both the input data as well as target data, so that they end up to be the same?
I'm providing a code snippet as an example.
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, KFold
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop, Adam
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
X, y = make_classification (n_features = 50, n_redundant = 0, random_state = 0,
scale = 100, n_clusters_per_class = 1)
# Define wrapper
def create_model (learn_rate = 0.01, input_shape, metrics = ['mse']):
model = Sequential ()
model.add (Dense (units = 64, activation = 'relu',
input_shape = (input_shape, )))
model.add (Dense (32, activation = 'relu'))
model.add (Dense (8, activation = 'relu'))
model.add (Dense (32, activation = 'relu'))
model.add (Dense (input_shape, activation = None))
model.compile (loss = 'mean_squared_error',
optimizer = Adam (lr = learn_rate),
metrics = metrics)
return model
# Create scaler
my_scaler = StandardScaler ()
steps = list ()
steps.append (('scaler', my_scaler))
standard_scaler_transformer = Pipeline (steps)
# Create classifier
clf = KerasRegressor (build_fn = create_model, verbose = 2)
# Assemble pipeline
# How to scale input and output??
clf = Pipeline (steps = [('scaler', my_scaler),
('classifier', clf)],
verbose = True)
# Run grid search
param_grid = {'classifier__input_shape' : [X.shape [1]],
'classifier__batch_size' : [50],
'classifier__learn_rate' : [0.001],
'classifier__epochs' : [5, 10]}
cv = KFold (n_splits = 5, shuffle = False)
grid = GridSearchCV (estimator = clf, param_grid = param_grid,
scoring = 'neg_mean_squared_error', verbose = 1, cv = cv)
grid_result = grid.fit (X, X)
print ('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
You can use TransformedTargetRegressor
to apply arbitrary transformations on the target values (i.e. y
) by providing either a function (i.e. using func
argument) or a transformer (i.e. transformer
argument).
In this case (i.e. fitting an auto-encoder model), since you want to apply the same StandardScalar
instance on the target values as well, you can use transformer
argument. And it could be done in one of the following ways:
You can use it as one of the pipeline steps, wrapping the regressor:
scaler = StandardScaler()
regressor = KerasRegressor(...)
pipe = Pipeline(steps=[
('scaler', scaler),
('ttregressor', TransformedTargetRegressor(regressor, transformer=scaler))
])
# Use `__regressor` to access the regressor hyperparameters
param_grid = {'ttregressor__regressor__hyperparam_name' : ...}
gridcv = GridSearchCV(estimator=pipe, param_grid=param_grid, ...)
gridcv.fit(X, X)
Alternatively, you can wrap it around the GridSearchCV
like this:
ttgridcv = TransformedTargetRegressor(GridSearchCV(...), transformer=scalar)
ttgridcv.fit(X, X)
# Use `regressor_` attribute to access the fitted regressor (i.e. `GridSearchCV` instance)
print(ttgridcv.regressor_.best_score_, ttgridcv.regressor_.best_params_))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With