Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scale target values of a Keras autoencoder model using a sklearn pipeline?

I'm using sklearn pipelines to build a Keras autoencoder model and use gridsearch to find the best hyperparameters. This works fine if I use a Multilayer Perceptron model for classification; however, in the autoencoder I need the output values to be the same as input. In other words, I am using a StandardScalar instance in the pipeline to scale the input values and therefore this leads to my question: how can I make the StandardScalar instance inside the pipeline to work on both the input data as well as target data, so that they end up to be the same?

I'm providing a code snippet as an example.

from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, KFold
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop, Adam
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor

X, y = make_classification (n_features = 50, n_redundant = 0, random_state = 0,
                            scale = 100, n_clusters_per_class = 1)

# Define wrapper
def create_model (learn_rate = 0.01, input_shape, metrics = ['mse']):
  model = Sequential ()
  model.add (Dense (units = 64, activation = 'relu',
                   input_shape = (input_shape, )))
  model.add (Dense (32, activation = 'relu'))
  model.add (Dense (8,  activation = 'relu'))
  model.add (Dense (32, activation = 'relu'))
  model.add (Dense (input_shape, activation = None))
  model.compile (loss = 'mean_squared_error',
                 optimizer = Adam (lr = learn_rate),
                 metrics = metrics)
  return model

# Create scaler
my_scaler = StandardScaler ()
steps = list ()
steps.append (('scaler', my_scaler))
standard_scaler_transformer = Pipeline (steps)

# Create classifier
clf = KerasRegressor (build_fn = create_model, verbose = 2)

# Assemble pipeline
# How to scale input and output??
clf = Pipeline (steps = [('scaler', my_scaler),
                         ('classifier', clf)],
                verbose = True)

# Run grid search
param_grid = {'classifier__input_shape' : [X.shape [1]],
              'classifier__batch_size' : [50],
              'classifier__learn_rate' : [0.001],
              'classifier__epochs' : [5, 10]}
cv = KFold (n_splits = 5, shuffle = False)
grid = GridSearchCV (estimator = clf, param_grid = param_grid,
                     scoring = 'neg_mean_squared_error', verbose = 1, cv = cv)
grid_result = grid.fit (X, X)

print ('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_))

like image 265
kaylani2 Avatar asked Jul 26 '20 00:07

kaylani2


Video Answer


1 Answers

You can use TransformedTargetRegressor to apply arbitrary transformations on the target values (i.e. y) by providing either a function (i.e. using func argument) or a transformer (i.e. transformer argument).

In this case (i.e. fitting an auto-encoder model), since you want to apply the same StandardScalar instance on the target values as well, you can use transformer argument. And it could be done in one of the following ways:

  • You can use it as one of the pipeline steps, wrapping the regressor:

    scaler = StandardScaler()
    regressor = KerasRegressor(...)
    
    pipe = Pipeline(steps=[
        ('scaler', scaler),
        ('ttregressor', TransformedTargetRegressor(regressor, transformer=scaler))
    ])
    
    # Use `__regressor` to access the regressor hyperparameters
    param_grid = {'ttregressor__regressor__hyperparam_name' : ...}
    
    gridcv = GridSearchCV(estimator=pipe, param_grid=param_grid, ...)
    gridcv.fit(X, X)
    
  • Alternatively, you can wrap it around the GridSearchCV like this:

     ttgridcv = TransformedTargetRegressor(GridSearchCV(...), transformer=scalar)
     ttgridcv.fit(X, X)
    
     # Use `regressor_` attribute to access the fitted regressor (i.e. `GridSearchCV` instance) 
     print(ttgridcv.regressor_.best_score_, ttgridcv.regressor_.best_params_))
    
like image 100
today Avatar answered Sep 27 '22 18:09

today