Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autoencoder loss is not decreasing (and starts very high)

I have the following function which is supposed to autoencode my data.

My data can be thought of as an image of length 100, width 2, and it has 2 channels (100, 2, 2)

def construct_ae(input_shape):
    encoder_input = tf.placeholder(tf.float32, input_shape, name='x')
    with tf.variable_scope("encoder"):
        flattened = tf.layers.flatten(encoder_input)
        e_fc_1 = tf.layers.dense(flattened, units=150, activation=tf.nn.relu)
        encoded = tf.layers.dense(e_fc_1, units=75, activation=None)

    with tf.variable_scope("decoder"):
        d_fc_1 = tf.layers.dense(encoded, 150, activation=tf.nn.relu)
        d_fc_2 = tf.layers.dense(d_fc_1, 400, activation=None)
        decoded = tf.reshape(d_fc_2, input_shape)

    with tf.variable_scope('training'):
        loss = tf.losses.mean_squared_error(labels=encoder_input, predictions=decoded)
        cost = tf.reduce_mean(loss)

        optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
        return optimizer

I'm running into the issue where my cost is on the order of 1.1e9, and it's not decreasing over time

cost over time

I visualized the gradients (removed the code because it would just clutter things) and I think something is wrong there? But I'm not sure

enter image description here

Questions

1) Does anything in the construction of the network look incorrect?

2) Does the data need to be normalized between 0-1?

3) I hit NaNs sometimes when I try increasing the learning rate to 1. Is that indicative of anything?

4) I think I should probably use a CNN but I ran into the same issues so I thought I'd move to an FC since it's likely easier to debug.

5) I imagine I'm using the wrong loss function but I can't really find any papers regarding the right loss to use. If anyone can direct me to one I'd be very appreciative

like image 586
IanQ Avatar asked Jul 08 '18 18:07

IanQ


1 Answers

  1. Given that this is a plain autoencoder and not a convolutional one, you shouldn't expect good (low) error rates.
  2. Normalizing does get you faster convergence. However given that your final layer does not have an activation function that enforces a range on the output, it shouldn't be a problem. However, do try normalizing your data to [0,1] and then using a sigmoid activation in your last decoder layer.
  3. A very high learning rate may get you stuck in an optimization loop and/or get you too far from any local minima, thus leading to extremely high error rates.
  4. Most blogs (like Keras) use 'binary_crossentropy' as their loss function, but MSE isn't "wrong"

As far as the high starting error is concerned; it all depends on your parameters' initialization. A good initialization technique gets you starting errors that are not too far from a desired minima. However, the default random or zeros-based initialization almost always leads to such scenarios.

like image 54
Anshuman Suri Avatar answered Nov 15 '22 06:11

Anshuman Suri