Coming from TensorFlow I feel like implementing anything else than basic, sequential models in Keras can be quite tricky. There is just so much stuff going on automatically. In TensorFlow, you always know your placeholders (input/output), shapes, structure, ... so that it is very easy to, for example, set up custom losses.
What is a clean way to define multiple outputs and custom loss functions?
Let's take an easy autoencoder as an example and use MNIST:
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1)
Short, convolutional encoder:
enc_in = Input(shape=(28, 28, 1), name="enc_in")
x = Conv2D(16, (3, 3))(enc_in)
x = LeakyReLU()(x)
x = MaxPool2D()(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = Flatten()(x)
z = Dense(100, name="z")(x)
enc = Model(enc_in, z, name="encoder")
Similar architecture for the decoder. We do not care for padding and the decrease in dimensionality due to convolutions, so we just apply bilinear resizing at the end to match (batch, 28, 28, 1)
again:
def resize_images(inputs, dims_xy):
x, y = dims_xy
return Lambda(lambda im: K.tf.image.resize_images(im, (y, x)))(inputs)
# decoder
dec_in = Input(shape=(100,), name="dec_in")
x = Dense(14 * 14 * 8)(dec_in)
x = LeakyReLU()(x)
x = Reshape((14, 14, 8))(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = UpSampling2D()(x)
x = Conv2D(16, (3, 3))(x)
x = LeakyReLU()(x)
x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28))
dec = Model(dec_in, dec_out, name="decoder")
We define our own MSE to have an easy example...
def custom_loss(y_true, y_pred):
return K.mean(K.square(y_true - y_pred))
...and finally build our complete model:
outputs = dec(enc(enc_in))
ae = Model(enc_in, outputs, name="ae")
ae.compile(optimizer=Adam(lr=1e-4), loss=custom_loss)
# training
ae.fit(x=X_train, y=X_train, batch_size=256, epochs=10)
If I define activation="sigmoid"
in the last layer of the decoder in order to get nice images (output interval [0.0, 1.0]) the training loss diverges as Keras is not using the logits automatically, but feeding sigmoid activations into the loss. Thus, it is much better & faster for training to use activation="linear"
in the last layer. In TensorFlow I would simply define two Tensors logits=x
and output=sigmoid(x)
to be able to use logits
in any custom loss function and output
for plotting or other applications.
How would I do such a thing in Keras?
Additionally, if I have several outputs, how do I use them in custom loss functions? Like KL divergence for VAEs or loss terms for GANs.
The functional API guide is not very helpful (especially if you compare this to the super extensive guides of TensorFlow) since it only covers basic LSTM examples where you do not have to define anything yourself, but only use predefined loss functions.
In TensorFlow I would simply define two Tensors logits=x and output=sigmoid(x) to be able to use logits in any custom loss function and output for plotting or other applications.
In Keras you do exactly the same:
x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28)) # Output tensor used in training, for the loss function
training_model = Model(dec_in, dec_out, name="decoder")
...
sigmoid = Activation('sigmoid')(dec_out)
inference_model = Model(dec_in, sigmoid)
training_model.fit(x=X_train, y=X_train, batch_size=256, epochs=10)
prediction = inference_model.predict(some_input)
In Keras world your life becomes much easier if you got a single output tensor. Then you can get standard Keras features working for it. For two outputs/losses one possible workaround can be to concatenate them before output and then split again in the loss function. A good example here can be SSD implementation, which has classification and localization losses: https://github.com/pierluigiferrari/ssd_keras/blob/master/keras_loss_function/keras_ssd_loss.py#L133
In general, I do not understand those complains. It can be understood that a new framework causes frustration at first, but Keras is great because it can be simple when you need standard stuff and flexible when you need to go beyond. Number of complex models' implementations in Keras model zoo is a good justification for that. By reading that code you can learn various patterns for constructing models in Keras.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With