Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: Out of memory when doing hyper parameter grid search

I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).

My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.

Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?

I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.

I'm using the Tensorflow backend.

Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,

for fsize in fsizes:

I guess the details of how the model is built don't matter much, but here is all of it anyway:

model_losses = []
model_names = []

for activation in activations:
    for i in range(len(layer_structures)):
        for width in layer_widths[i]:
            for fsize in fsizes:

                model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
                model_names.append(model_name)
                print("Testing new model: ", model_name)

                #Structure for this network
                structure = layer_structures[i]

                row, col, ch = 80, 160, 3  # Input image format

                model = Sequential()

                model.add(Lambda(lambda x: x/127.5 - 1.,
                          input_shape=(row, col, ch),
                          output_shape=(row, col, ch)))

                for j in range(len(structure)):
                    if structure[j] == 'conv':
                        model.add(Convolution2D(width[j], fsize, fsize))
                        model.add(BatchNormalization(axis=3, momentum=0.99))
                        if activation == 'relu':
                            model.add(Activation('relu'))
                        if activation == 'elu':
                            model.add(ELU())
                            model.add(MaxPooling2D())
                    elif structure[j] == 'dense':
                        if structure[j-1] == 'dense':
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())
                        else:
                            model.add(Flatten())
                            model.add(Dense(width[j]))
                            model.add(BatchNormalization(axis=1, momentum=0.99))
                            if activation == 'relu':
                                model.add(Activation('relu'))
                            elif activation == 'elu':
                                model.add(ELU())

                model.add(Dense(1))

                average_loss = 0
                for k in range(5):
                    model.compile(optimizer="adam", loss="mse")
                    val_generator = generate_batch(X_val, y_val, resize=(160,80))
                    loss = model.evaluate_generator(val_generator, len(y_val))
                    average_loss += loss

                average_loss /= 5

                model_losses.append(average_loss)

                print("Average loss after 5 initializations: {:.3f}".format(average_loss))
                print()
like image 384
Alex Avatar asked Feb 05 '17 01:02

Alex


2 Answers

As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.

After the usage of the model just put:

if K.backend() == 'tensorflow':
    K.clear_session()

Include the backend:

from keras import backend as K

Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.

like image 104
indraforyou Avatar answered Oct 13 '22 11:10

indraforyou


Using the tip given by indraforyou, I added the code to clear the TensorFlow session inside the function I pass to GridSearchCV, like this:

def create_model():
    # cleanup
    K.clear_session()

    inputs = Input(shape=(4096,))
    x = Dense(2048, activation='relu')(inputs)
    p = Dense(2, activation='sigmoid')(x)
    model = Model(input=inputs, outputs=p)
    model.compile(optimizer='SGD',
              loss='mse',
              metrics=['accuracy'])
    return model

And then I can invoke the grid search:

model = KerasClassifier(build_fn=create_model)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)

It should work.

Cheers!

like image 28
Andrey Kuehlkamp Avatar answered Oct 13 '22 10:10

Andrey Kuehlkamp