Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the loss of a CNN decrease for a long time and then suddenly increase?

I made a simple network to find broken lines and I had a very strange training run. The loss, keras.losses.binary_crossentropy, was decreasing steadily for around 1500 epochs then suddenly, it shot up and plateaued.

graph of loss function vs epoch

What are some reasons this happens? Optimizers, loss function, network structure?

I checked the weights, and none of the weights have a NaN value. The input data is 250,000+ 32x32 images with lines on them, and the same stack of images where the lines have a few pixels removed from so they're "broken".

sample solid lines and broken lines

Here is the model creation code:

input_shape = (1, 32, 32)
kernel_shape = (16, 16)
keras.backend.set_image_data_format("channels_first")
n_filters = 64
input_layer = engine.Input(input_shape)
active_1 = layers.Activation("relu")(input_layer)
conv_1 = layers.Conv2D(n_filters, kernel_shape)(active_1)
conv_2 = layers.Conv2D(2*n_filters, kernel_shape)(conv_1)
pool_1 = layers.MaxPooling2D()(conv_2)

s = tupleFromShape(pool_1.shape)
p = 1
for d in s:
    p *= d

shaped_1 = layers.Reshape((p,))(pool_1)
dense_1 = layers.Dense(2)(shaped_1)
out = layers.Activation("softmax")(dense_1)
model = engine.Model(input_layer, out)
model.save("broken-lines-start.h5")

And the training code:

full = #numpy array (c, slices, 32, 32)
broken = #numpy array(c, slices, 32, 32)
full = full[0]
broken = broken[0]

n = len(full) - 1024
n2 = len(broken) - 1024

random.shuffle(full)
random.shuffle(broken)

optimizer = keras.optimizers.Adam(0.00001)
loss_function = keras.losses.binary_crossentropy
model.compile(
        model, 
        optimizer, 
        loss_function=loss_function)
batch_size = 256
steps = n//batch_size + n2//batch_size
model.fit_generator(generator=getDataGenerator(full[:n], broken[:n2], batch_size),
                steps_per_epoch=steps,
                epochs=4680,
                validation_data=getDataGenerator(full[n:], broken[n2:], batch_size),
                validation_steps=2048//batch_size,
                callbacks=[saves_last_epoch_and_best_epoch]
                    )
model.save("broken-lines-trained.h5")

The generator code:

def getDataGenerator(solid, broken, batch_size=128):
    zed = [([chunk], [1, 0]) for chunk in solid] + [([chunk], [0, 1]) for chunk in broken]
    random.shuffle(zed)
    xbatch = []
    ybatch = []
    while True:
        for i in range(len(zed)):
            x,y = zed[i]
            xbatch.append(x)
            ybatch.append(y)
            if len(xbatch)==batch_size:
                yield numpy.array(xbatch),numpy.array(ybatch)
                xbatch = []
                ybatch = []

I have greatly improved this model, and it hasn't exhibited this behavior yet, but I would like to understand why this happened.

Subsequent things I have tried:

Change the loss function to logcosh -> works

Change the epsilon value of the adam optimizer -> still blows up.

Change the optimizer to SGD -> blows up faster, didn't have initial decrease.

like image 796
matt Avatar asked Oct 25 '25 22:10

matt


1 Answers

One of the possible issues might be with the Adam optimizer -- it is known to "explode" when you train it for a long time.

Let's look at the formula of Adam (sorry for the ugly presentation, may change to beautiful LaTeX later):

t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)

m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)

where m and v are estimates of the first moment (the mean) and the second moment (the uncentered variance) of the gradients respectively. When you trained the model for a long time, v can become very small.

By default, according to tensorflow docs, beta1=0.9 and beta2=0.999. So m changes more quickly than v. So m can start being big again while v cannot catch up. This will result in a large number dividing by a very small value and explode.

Try to increase the epsilon parameter, which is 1e-08 by default. Try experimenting with values like 0.01, or 0.001, depending on your model.

like image 137
FalconUA Avatar answered Oct 29 '25 20:10

FalconUA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!