Essentially, I am training an LSTM model using Keras, but when I save it, its size takes up to 100MB. However, my purpose of the model is to deploy to a web server in order to serve as an API, my web server is not able to run it since the model size is too big. After analyzing all parameters in my model, I figured out that my model has 20,000,000
parameters but 15,000,000
parameters are untrained since they are word embeddings. Is there any way that I can minimize the size of the model by removing that 15,000,000
parameters but still preserving the performance of the model?
Here is my code for the model:
def LSTModel(input_shape, word_to_vec_map, word_to_index):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.5)(X)
X = LSTM(256, return_sequences=False)(X)
X = Dropout(0.5)(X)
X = Dense(NUM_OF_LABELS)(X)
X = Activation("softmax")(X)
model = Model(inputs=sentence_indices, outputs=X)
return model
LSTM Default return value: The size of output is 2D array of real numbers. The second dimension is the dimensionality of the output space defined by the units parameter in Keras LSTM implementation.
The vanilla LSTM network has three layers; an input layer, a single hidden layer followed by a standard feedforward output layer.
A good method to drastically lower these parameters is to add: subsample=(2, 2) (careful it lowers the resolution of images/data) in all the Convolutional layers above that Flatten layer, if subsample doesn't work then it is stride=(2, 2) .
Define the layers you want to save outside the function and name them. Then create two functions foo()
and bar()
. foo()
will have the original pipeline including the embedding layer. bar()
will have only the part of pipeline AFTER embedding layer. Instead, you will define new Input()
layer in bar()
with dimensions of your embeddings:
lstm1 = LSTM(256, return_sequences=True, name='lstm1')
lstm2 = LSTM(256, return_sequences=False, name='lstm2')
dense = Dense(NUM_OF_LABELS, name='Susie Dense')
def foo(...):
sentence_indices = Input(input_shape, dtype="int32")
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
embeddings = embedding_layer(sentence_indices)
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
def bar(...):
embeddings = Input(embedding_shape, dtype="float32")
X = lstm1(embeddings)
X = Dropout(0.5)(X)
X = lstm2(X)
X = Dropout(0.5)(X)
X = dense(X)
X = Activation("softmax")(X)
return Model(inputs=sentence_indices, outputs=X)
foo_model = foo(...)
bar_model = bar(...)
foo_model.fit(...)
bar_model.save_weights(...)
Now, you will train the original foo()
model. Then you can save the weights of the reduced bar()
model. When loading the model, don't forget to specify by_name=True
parameter:
foo_model.load_weights('bar_model.h5', by_name=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With