Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: clean implementation for multiple outputs and custom loss functions?

Coming from TensorFlow I feel like implementing anything else than basic, sequential models in Keras can be quite tricky. There is just so much stuff going on automatically. In TensorFlow, you always know your placeholders (input/output), shapes, structure, ... so that it is very easy to, for example, set up custom losses.

What is a clean way to define multiple outputs and custom loss functions?

Let's take an easy autoencoder as an example and use MNIST:

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1)

Short, convolutional encoder:

enc_in = Input(shape=(28, 28, 1), name="enc_in")
x = Conv2D(16, (3, 3))(enc_in)
x = LeakyReLU()(x)
x = MaxPool2D()(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = Flatten()(x)
z = Dense(100, name="z")(x)

enc = Model(enc_in, z, name="encoder")

Similar architecture for the decoder. We do not care for padding and the decrease in dimensionality due to convolutions, so we just apply bilinear resizing at the end to match (batch, 28, 28, 1) again:

def resize_images(inputs, dims_xy):
    x, y = dims_xy
    return Lambda(lambda im: K.tf.image.resize_images(im, (y, x)))(inputs)

# decoder
dec_in = Input(shape=(100,), name="dec_in")
x = Dense(14 * 14 * 8)(dec_in)
x = LeakyReLU()(x)
x = Reshape((14, 14, 8))(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = UpSampling2D()(x)
x = Conv2D(16, (3, 3))(x)
x = LeakyReLU()(x)
x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28))

dec = Model(dec_in, dec_out, name="decoder")

We define our own MSE to have an easy example...

def custom_loss(y_true, y_pred):
    return K.mean(K.square(y_true - y_pred))

...and finally build our complete model:

outputs = dec(enc(enc_in))
ae = Model(enc_in, outputs, name="ae")
ae.compile(optimizer=Adam(lr=1e-4), loss=custom_loss)

# training
ae.fit(x=X_train, y=X_train, batch_size=256, epochs=10)

If I define activation="sigmoid" in the last layer of the decoder in order to get nice images (output interval [0.0, 1.0]) the training loss diverges as Keras is not using the logits automatically, but feeding sigmoid activations into the loss. Thus, it is much better & faster for training to use activation="linear" in the last layer. In TensorFlow I would simply define two Tensors logits=x and output=sigmoid(x) to be able to use logits in any custom loss function and output for plotting or other applications.

How would I do such a thing in Keras?

Additionally, if I have several outputs, how do I use them in custom loss functions? Like KL divergence for VAEs or loss terms for GANs.

The functional API guide is not very helpful (especially if you compare this to the super extensive guides of TensorFlow) since it only covers basic LSTM examples where you do not have to define anything yourself, but only use predefined loss functions.

like image 633
daniel451 Avatar asked Nov 06 '22 21:11

daniel451


1 Answers

In TensorFlow I would simply define two Tensors logits=x and output=sigmoid(x) to be able to use logits in any custom loss function and output for plotting or other applications.

In Keras you do exactly the same:

x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28))  # Output tensor used in training, for the loss function

training_model = Model(dec_in, dec_out, name="decoder")

...

sigmoid = Activation('sigmoid')(dec_out)
inference_model = Model(dec_in, sigmoid)

training_model.fit(x=X_train, y=X_train, batch_size=256, epochs=10)

prediction = inference_model.predict(some_input)

In Keras world your life becomes much easier if you got a single output tensor. Then you can get standard Keras features working for it. For two outputs/losses one possible workaround can be to concatenate them before output and then split again in the loss function. A good example here can be SSD implementation, which has classification and localization losses: https://github.com/pierluigiferrari/ssd_keras/blob/master/keras_loss_function/keras_ssd_loss.py#L133

In general, I do not understand those complains. It can be understood that a new framework causes frustration at first, but Keras is great because it can be simple when you need standard stuff and flexible when you need to go beyond. Number of complex models' implementations in Keras model zoo is a good justification for that. By reading that code you can learn various patterns for constructing models in Keras.

like image 163
Dmytro Prylipko Avatar answered Nov 15 '22 08:11

Dmytro Prylipko