Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: Use the same layer in different models (share weights)

Quick answer:

This is in fact really easy. Here's the code (for those who don't want to read all that text):

inputs=Input((784,))
encode=Dense(10, input_shape=[784])(inputs)
decode=Dense(784, input_shape=[10])

model=Model(input=inputs, output=decode(encode))

inputs_2=Input((10,))
decode_model=Model(input=inputs_2, output=decode(inputs_2))

In this setup, the decode_model will use the same decode layer as the model. If you train the model, the decode_model will be trained, too.

Actual question:

I'm trying to create a simple autoencoder for MNIST in Keras:

This is the code so far:

model=Sequential()
encode=Dense(10, input_shape=[784])
decode=Dense(784, input_shape=[10])

model.add(encode)
model.add(decode)


model.compile(loss="mse",
             optimizer="adadelta",
             metrics=["accuracy"])

decode_model=Sequential()
decode_model.add(decode)

I'm training it to learn the identity function

model.fit(X_train,X_train,batch_size=50, nb_epoch=10, verbose=1, 
          validation_data=[X_test, X_test])

The reconstruction is quite interesting:

enter image description here

But I would also like to look at the representations of cluster. What is the output of passing [1,0...0] to the decoding layer ? This should be the "cluster-mean" of one class in MNIST.

In order to do that I created a second model decode_model, which reuses the decoder layer. But if I try to use that model, it complains:

Exception: Error when checking : expected dense_input_5 to have shape (None, 784) but got array with shape (10, 10)

That seemed strange. It's simply a dense layer, the Matrix wouldn't even be able to process 784-dim input. I decided to look at the model summary:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_14 (Dense)                 (None, 784)           8624        dense_13[0][0]                   
====================================================================================================
Total params: 8624

It is connected to dense_13. It's difficult to keep track of the names of the layers, but that looks like the encoder layer. Sure enough, the model summary of the whole model is:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_13 (Dense)                 (None, 10)            7850        dense_input_6[0][0]              
____________________________________________________________________________________________________
dense_14 (Dense)                 (None, 784)           8624        dense_13[0][0]                   
====================================================================================================
Total params: 16474
____________________

Apparently the layers are permanently connected. Strangely there is no input layer in my decode_model.

How can I reuse a layer in Keras ? I've looked at the functional API, but there too, layers are fused together.

like image 524
lhk Avatar asked Oct 27 '16 07:10

lhk


1 Answers

Oh, nevermind.

I should have read the entire functional API: https://keras.io/getting-started/functional-api-guide/#shared-layers

Here's one of the predictions (maybe still lacking some training): enter image description here

I'm guessing this could be a 3 ? Well at least it works now.

And for those with similar problems, here's the updated code:

inputs=Input((784,))
encode=Dense(10, input_shape=[784])(inputs)
decode=Dense(784, input_shape=[10])

model=Model(input=inputs, output=decode(encode))


model.compile(loss="mse",
             optimizer="adadelta",
             metrics=["accuracy"])

inputs_2=Input((10,))
decode_model=Model(input=inputs_2, output=decode(inputs_2))

I only compiled one of the models. For training you need to compile a model, for prediction that is not necessary.

like image 116
lhk Avatar answered Oct 20 '22 02:10

lhk