I made a simple network to find broken lines and I had a very strange training run. The loss, keras.losses.binary_crossentropy, was decreasing steadily for around 1500 epochs then suddenly, it shot up and plateaued.

What are some reasons this happens? Optimizers, loss function, network structure?
I checked the weights, and none of the weights have a NaN value. The input data is 250,000+ 32x32 images with lines on them, and the same stack of images where the lines have a few pixels removed from so they're "broken".

Here is the model creation code:
input_shape = (1, 32, 32)
kernel_shape = (16, 16)
keras.backend.set_image_data_format("channels_first")
n_filters = 64
input_layer = engine.Input(input_shape)
active_1 = layers.Activation("relu")(input_layer)
conv_1 = layers.Conv2D(n_filters, kernel_shape)(active_1)
conv_2 = layers.Conv2D(2*n_filters, kernel_shape)(conv_1)
pool_1 = layers.MaxPooling2D()(conv_2)
s = tupleFromShape(pool_1.shape)
p = 1
for d in s:
p *= d
shaped_1 = layers.Reshape((p,))(pool_1)
dense_1 = layers.Dense(2)(shaped_1)
out = layers.Activation("softmax")(dense_1)
model = engine.Model(input_layer, out)
model.save("broken-lines-start.h5")
And the training code:
full = #numpy array (c, slices, 32, 32)
broken = #numpy array(c, slices, 32, 32)
full = full[0]
broken = broken[0]
n = len(full) - 1024
n2 = len(broken) - 1024
random.shuffle(full)
random.shuffle(broken)
optimizer = keras.optimizers.Adam(0.00001)
loss_function = keras.losses.binary_crossentropy
model.compile(
model,
optimizer,
loss_function=loss_function)
batch_size = 256
steps = n//batch_size + n2//batch_size
model.fit_generator(generator=getDataGenerator(full[:n], broken[:n2], batch_size),
steps_per_epoch=steps,
epochs=4680,
validation_data=getDataGenerator(full[n:], broken[n2:], batch_size),
validation_steps=2048//batch_size,
callbacks=[saves_last_epoch_and_best_epoch]
)
model.save("broken-lines-trained.h5")
The generator code:
def getDataGenerator(solid, broken, batch_size=128):
zed = [([chunk], [1, 0]) for chunk in solid] + [([chunk], [0, 1]) for chunk in broken]
random.shuffle(zed)
xbatch = []
ybatch = []
while True:
for i in range(len(zed)):
x,y = zed[i]
xbatch.append(x)
ybatch.append(y)
if len(xbatch)==batch_size:
yield numpy.array(xbatch),numpy.array(ybatch)
xbatch = []
ybatch = []
I have greatly improved this model, and it hasn't exhibited this behavior yet, but I would like to understand why this happened.
Subsequent things I have tried:
Change the loss function to logcosh -> works
Change the epsilon value of the adam optimizer -> still blows up.
Change the optimizer to SGD -> blows up faster, didn't have initial decrease.
One of the possible issues might be with the Adam optimizer -- it is known to "explode" when you train it for a long time.
Let's look at the formula of Adam (sorry for the ugly presentation, may change to beautiful LaTeX later):
t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
where m and v are estimates of the first moment (the mean) and the second moment (the uncentered variance) of the gradients respectively. When you trained the model for a long time, v can become very small.
By default, according to tensorflow docs, beta1=0.9 and beta2=0.999. So m changes more quickly than v. So m can start being big again while v cannot catch up. This will result in a large number dividing by a very small value and explode.
Try to increase the epsilon parameter, which is 1e-08 by default. Try experimenting with values like 0.01, or 0.001, depending on your model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With