I'm running multiple nested loops to do hyper parameter grid search. Each nested loop runs through a list of hyper parameter values and inside the innermost loop, a Keras sequential model is built and evaluated each time using a generator. (I'm not doing any training, I'm just randomly initializing and then evaluating the model multiple times and then retrieving the average loss).
My problem is that during this process, Keras seems to be filling up my GPU memory, so that I eventually get an OOM error.
Does anybody know how to solve this and free up the GPU memory each time after a model is evaluated?
I do not need the model anymore at all after it has been evaluated, I can throw it away entirely every time before building a new one in the next pass of the inner loop.
I'm using the Tensorflow backend.
Here is the code, although much of it isn't relevant to the general problem. The model is built inside the fourth loop,
for fsize in fsizes:
I guess the details of how the model is built don't matter much, but here is all of it anyway:
model_losses = []
model_names = []
for activation in activations:
for i in range(len(layer_structures)):
for width in layer_widths[i]:
for fsize in fsizes:
model_name = "test_{}_struc-{}_width-{}_fsize-{}".format(activation,i,np.array_str(np.array(width)),fsize)
model_names.append(model_name)
print("Testing new model: ", model_name)
#Structure for this network
structure = layer_structures[i]
row, col, ch = 80, 160, 3 # Input image format
model = Sequential()
model.add(Lambda(lambda x: x/127.5 - 1.,
input_shape=(row, col, ch),
output_shape=(row, col, ch)))
for j in range(len(structure)):
if structure[j] == 'conv':
model.add(Convolution2D(width[j], fsize, fsize))
model.add(BatchNormalization(axis=3, momentum=0.99))
if activation == 'relu':
model.add(Activation('relu'))
if activation == 'elu':
model.add(ELU())
model.add(MaxPooling2D())
elif structure[j] == 'dense':
if structure[j-1] == 'dense':
model.add(Dense(width[j]))
model.add(BatchNormalization(axis=1, momentum=0.99))
if activation == 'relu':
model.add(Activation('relu'))
elif activation == 'elu':
model.add(ELU())
else:
model.add(Flatten())
model.add(Dense(width[j]))
model.add(BatchNormalization(axis=1, momentum=0.99))
if activation == 'relu':
model.add(Activation('relu'))
elif activation == 'elu':
model.add(ELU())
model.add(Dense(1))
average_loss = 0
for k in range(5):
model.compile(optimizer="adam", loss="mse")
val_generator = generate_batch(X_val, y_val, resize=(160,80))
loss = model.evaluate_generator(val_generator, len(y_val))
average_loss += loss
average_loss /= 5
model_losses.append(average_loss)
print("Average loss after 5 initializations: {:.3f}".format(average_loss))
print()
As indicated, the backend being used is Tensorflow. With the Tensorflow backend the current model is not destroyed, so you need to clear the session.
After the usage of the model just put:
if K.backend() == 'tensorflow':
K.clear_session()
Include the backend:
from keras import backend as K
Also you can use sklearn wrapper to do grid search. Check this example: here. Also for more advanced hyperparameter search you can use hyperas.
Using the tip given by indraforyou, I added the code to clear the TensorFlow session inside the function I pass to GridSearchCV, like this:
def create_model():
# cleanup
K.clear_session()
inputs = Input(shape=(4096,))
x = Dense(2048, activation='relu')(inputs)
p = Dense(2, activation='sigmoid')(x)
model = Model(input=inputs, outputs=p)
model.compile(optimizer='SGD',
loss='mse',
metrics=['accuracy'])
return model
And then I can invoke the grid search:
model = KerasClassifier(build_fn=create_model)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
It should work.
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With