High bias convolutional neural network not improving with more layers/filters

Question

I am training a convolutional neural network using TensorFlow to classify images of buildings into 5 classes.

Training dataset:  
Class 1 - 3000 images
Class 2 - 3000 images
Class 3 - 3000 images
Class 4 - 3000 images
Class 5 - 3000 images

I started out with a very simple architecture:

Input image - 256 x 256 x 3

Convolutional layer 1 - 128 x 128 x 16 (3x3 filters, 16 filters, stride=2)
Convolutional layer 2 - 64 x 64 x 32 (3x3 filters, 32 filters, stride=2)
Convolutional layer 3 - 32 x 32 x 64 (3x3 filters, 64 filters, stride=2)

Max-pooling layer - 16 x 16 x 64 (2x2 pooling)

Fully-connected layer 1 - 1 x 1024
Fully-connected layer 2 - 1 x 64

Output - 1 x 5

Other details of my network:

Cost-function: tf.softmax_cross_entropy_with_logits
Optimizer: Adam optimizer (Learning rate=0.01, Epsilon=0.1)
Mini-batch size: 5

My cost-function has a high starting value of around 10^10 and then drops rapidly to a value of about 1.6 (after a few hundred iterations) and saturates at that value (no matter how long I train the network for). The cost-function value on the test set is the same. This value is equivalent to predicting approximately equal probability for each class and it makes the same predictions for all images. My predictions look something like this:

[0.191877 0.203651 0.194455 0.200043 0.203081]

Training loss - 3000 iterations

A high error on both the training and test set indicate high bias i.e. underfitting. I increased the complexity of my network by adding layers and increasing the number of filters and my latest network is this (the number of layers and filter sizes are similar to AlexNet):

Input image - 256 x 256 x 3

Convolutional layer 1 - 64 x 64 x 64 (11x11 filters, 64 filters, stride=4)
Convolutional layer 2 - 32 x 32 x 128 (5x5 filters, 128 filters, stride=2)
Convolutional layer 3 - 16 x 16 x 256 (3x3 filters, 256 filters, stride=2)
Convolutional layer 4 - 8 x 8 x 512 (3x3 filters, 512 filters, stride=2)
Convolutional layer 5 - 8 x 8 x 256 (3x3 filters, 256 filters, stride=1)

Fully-connected layer 1 - 1 x 4096
Fully-connected layer 2 - 1 x 4096
Fully-connected layer 3 - 1 x 4096
Dropout layer (0.5 probability)

Output - 1 x 5

However, my cost-function is still saturating at approximately 1.6 and making the same predictions.

My questions are:

What other solutions should I try to fix a high bias network? I have (and still am) trying different learning rates and initialisation of weights - but to no avail.
Is it because my training set is too small? Wouldn't a small training set lead to a high variance network? It would overfit to the training images and have low training error, but high test error.
Is it possible that there are no distinguishable features in these images? However, considering the fact that other CNNs can distinguish between breeds of dogs, this does not seem likely.
As a sanity check, I am training my network on a very small dataset (50 images) and I am expecting it to overfit. However, it doesn't look like it is going to; it looks like the same problem is going to occur.

Code:

import tensorflow as tf
sess = tf.Session()

BATCH_SIZE = 50
MAX_CAPACITY = 300
TRAINING_STEPS = 3001

# To get the list of image filenames and labels from the text file
def read_labeled_image_list(list_filename):

    f = open(list_filename,'r')
    filenames = []
    labels = []

    for line in f:

        filename, label = line[:-1].split(' ')
        filenames.append(filename)
        labels.append(int(label))

    return filenames,labels  

# To get images and labels in batches
def add_to_batch(image,label):

    image_batch,label_batch = tf.train.batch([image,label],batch_size=BATCH_SIZE,num_threads=1,capacity=MAX_CAPACITY)

    return image_batch, tf.reshape(label_batch,[BATCH_SIZE])

# To decode a single image and its label
def read_image_with_label(input_queue):

    """ Image """
    # Read
    file_contents = tf.read_file(input_queue[0])
    example = tf.image.decode_png(file_contents)

    # Reshape
    my_image = tf.cast(example,tf.float32)
    my_image = tf.reshape(my_image,[256,256,3])

    # Normalisation
    my_image = my_image/255
    my_mean = tf.reduce_mean(my_image)

    # Centralisation
    my_image = my_image - my_mean


    """ Label """
    label = input_queue[1]-1

    return add_to_batch(my_image,label)

# Network
def inference(x):

    """ Layer 1: Convolutional """
    # Initialise variables
    W_conv1 = tf.Variable(tf.truncated_normal([11,11,3,64],stddev=0.0001),name='W_conv1')
    b_conv1 = tf.Variable(tf.constant(0.1,shape=[64]),name='b_conv1')

    # Convolutional layer
    h_conv1 = tf.nn.relu(tf.nn.conv2d(x,W_conv1,strides=[1,4,4,1],padding='SAME') + b_conv1)

    """ Layer 2: Convolutional """
    # Initialise variables
    W_conv2 = tf.Variable(tf.truncated_normal([5,5,64,128],stddev=0.0001),name='W_conv2')
    b_conv2 = tf.Variable(tf.constant(0.1,shape=[128]),name='b_conv2')

    # Convolutional layer
    h_conv2 = tf.nn.relu(tf.nn.conv2d(h_conv1,W_conv2,strides=[1,2,2,1],padding='SAME') + b_conv2)

    """ Layer 3: Convolutional """
    # Initialise variables
    W_conv3 = tf.Variable(tf.truncated_normal([3,3,128,256],stddev=0.0001),name='W_conv3')
    b_conv3 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv3')

    # Convolutional layer
    h_conv3 = tf.nn.relu(tf.nn.conv2d(h_conv2,W_conv3,strides=[1,2,2,1],padding='SAME') + b_conv3)

    """ Layer 4: Convolutional """
    # Initialise variables
    W_conv4 = tf.Variable(tf.truncated_normal([3,3,256,512],stddev=0.0001),name='W_conv4')
    b_conv4 = tf.Variable(tf.constant(0.1,shape=[512]),name='b_conv4')

    # Convolutional layer
    h_conv4 = tf.nn.relu(tf.nn.conv2d(h_conv3,W_conv4,strides=[1,2,2,1],padding='SAME') + b_conv4)

    """ Layer 5: Convolutional """
    # Initialise variables
    W_conv5 = tf.Variable(tf.truncated_normal([3,3,512,256],stddev=0.0001),name='W_conv5')
    b_conv5 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv5')

    # Convolutional layer
    h_conv5 = tf.nn.relu(tf.nn.conv2d(h_conv4,W_conv5,strides=[1,1,1,1],padding='SAME') + b_conv5)

    """ Layer X: Pooling
    # Pooling layer
    h_pool1 = tf.nn.max_pool(h_conv3,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')"""

    """ Layer 6: Fully-connected """
    # Initialise variables
    W_fc1 = tf.Variable(tf.truncated_normal([8*8*256,4096],stddev=0.0001),name='W_fc1')
    b_fc1 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc1')

    # Multiplication layer
    h_conv5_reshaped = tf.reshape(h_conv5,[-1,8*8*256])
    h_fc1 = tf.nn.relu(tf.matmul(h_conv5_reshaped, W_fc1) + b_fc1)

    """ Layer 7: Fully-connected """
    # Initialise variables
    W_fc2 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc2')
    b_fc2 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc2')

    # Multiplication layer
    h_fc2 = tf.nn.relu(tf.matmul(h_fc1, W_fc2) + b_fc2)

    """ Layer 8: Fully-connected """
    # Initialise variables
    W_fc3 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc3')
    b_fc3 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc3')

    # Multiplication layer
    h_fc3 = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3)

    """ Layer 9: Dropout layer """
    # Keep/drop nodes with 50% chance
    h_dropout = tf.nn.dropout(h_fc3,0.5)

    """ Readout layer: Softmax """
    # Initialise variables
    W_softmax = tf.Variable(tf.truncated_normal([4096,5],stddev=0.0001),name='W_softmax')
    b_softmax = tf.Variable(tf.constant(0.1,shape=[5]),name='b_softmax')

    # Multiplication layer
    y_conv = tf.nn.relu(tf.matmul(h_dropout,W_softmax) + b_softmax)

    """ Summaries """
    tf.histogram_summary('W_conv1',W_conv1)
    tf.histogram_summary('W_conv2',W_conv2)
    tf.histogram_summary('W_conv3',W_conv3)
    tf.histogram_summary('W_conv4',W_conv4)
    tf.histogram_summary('W_conv5',W_conv5)
    tf.histogram_summary('W_fc1',W_fc1)
    tf.histogram_summary('W_fc2',W_fc2)
    tf.histogram_summary('W_fc3',W_fc3)
    tf.histogram_summary('W_softmax',W_softmax)
    tf.histogram_summary('b_conv1',b_conv1)
    tf.histogram_summary('b_conv2',b_conv2)
    tf.histogram_summary('b_conv3',b_conv3)
    tf.histogram_summary('b_conv4',b_conv4)
    tf.histogram_summary('b_conv5',b_conv5)
    tf.histogram_summary('b_fc1',b_fc1)
    tf.histogram_summary('b_fc2',b_fc2)
    tf.histogram_summary('b_fc3',b_fc3)
    tf.histogram_summary('b_softmax',b_softmax)

    return y_conv

# Training
def cost_function(y_label,y_conv):

    # Reshape y_label to one-hot vectors
    sparse_labels = tf.reshape(y_label,[BATCH_SIZE,1])
    indices = tf.reshape(tf.range(BATCH_SIZE),[BATCH_SIZE,1])
    concated = tf.concat(1,[indices,sparse_labels])
    dense_labels = tf.sparse_to_dense(concated,[BATCH_SIZE,5],1.0,0.0)

    # Cross-entropy
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv,dense_labels))

    # Accuracy
    y_prob = tf.nn.softmax(y_conv)
    correct_prediction = tf.equal(tf.argmax(dense_labels,1), tf.argmax(y_prob,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    # Add to summary
    tf.scalar_summary('loss',cost)
    tf.scalar_summary('accuracy',accuracy)

    return cost, accuracy

def main ():

    # To get list of filenames and labels
    filename = '/labels/filenames_with_labels_server.txt'
    image_list, label_list = read_labeled_image_list(filename)
    images = tf.convert_to_tensor(image_list, dtype=tf.string)
    labels = tf.convert_to_tensor(label_list,dtype=tf.int32)

    # To create the queue
    input_queue = tf.train.slice_input_producer([images,labels],shuffle=True,capacity=MAX_CAPACITY)

    # To train network
    image,label = read_image_with_label(input_queue)
    y_conv = inference(image)
    loss,acc = cost_function(label,y_conv)
    train_step = tf.train.AdamOptimizer(learning_rate=0.001,epsilon=0.1).minimize(loss)

    # To write and merge summaries
    writer = tf.train.SummaryWriter('/SummaryLogs/log', sess.graph)
    merged = tf.merge_all_summaries()

    # To save variables
    saver = tf.train.Saver()

    """ Run session """
    sess.run(tf.initialize_all_variables())
    tf.train.start_queue_runners(sess=sess)

    print('Running...')
    for step in range(1,TRAINING_STEPS):

        loss_val,acc_val,_,summary_str = sess.run([loss,acc,train_step,merged])
        writer.add_summary(summary_str,step)
        print "Step %d, Loss %g, Accuracy %g"%(step,loss_val,acc_val)

        if(step == 1):
            save_path = saver.save(sess,'/SavedVariables/model',global_step=step)
        print "Initial model saved: %s"%save_path

    save_path = saver.save(sess,'/SavedVariables/model-final')
    print "Final model saved: %s"%save_path

    """ Close session """
    print('Finished')
    sess.close()

if __name__ == '__main__':
  main()

EDIT:

After making some changes, I managed to get the network to overfit to a small training set of 50 images.

Changes:

Initialization of weights using Xavier initialization
Initialization of bias to zero
No normalisation of images i.e. no division by 255
Centred the images by subtracting the mean pixel value (calculated over the whole training set). In this case, the mean was 114.

Encouraged by this, I proceeded to train my network on the whole training set, only to encounter the SAME issue again. These are the outputs:

Step 1, Loss 1.37815, Accuracy 0.4

y_conv (before softmax):
[[ 0.30913264  0.          1.20176554  0.          0.        ]
 [ 0.          0.          1.23200822  0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          1.65852785  0.01910716  0.        ]
 [ 0.          0.          0.94612855  0.          0.10457891]]

y_prob (after softmax):
[[ 0.1771856   0.130069    0.43260741  0.130069    0.130069  ]
 [ 0.13462381  0.13462381  0.46150482  0.13462381  0.13462381]
 [ 0.2         0.2         0.2         0.2         0.2       ]
 [ 0.1078648   0.1078648   0.56646001  0.1099456   0.1078648 ]
 [ 0.14956713  0.14956713  0.38524282  0.14956713  0.16605586]]

Very quickly it becomes:

Step 39, Loss 1.60944, Accuracy 0.2

y_conv (before softmax):
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]

y_prob (after softmax):
[[ 0.2  0.2  0.2  0.2  0.2]
 [ 0.2  0.2  0.2  0.2  0.2]
 [ 0.2  0.2  0.2  0.2  0.2]
 [ 0.2  0.2  0.2  0.2  0.2]
 [ 0.2  0.2  0.2  0.2  0.2]]

Clearly a y_conv of all zeros is not a good sign. Looking at the histograms, the weight variables do not change after initialization; only the bias variables change.

jlhw · Accepted Answer

This is not so much a "complete" answer but rather a "things you can try if you are facing a similar problem" answer.

I managed to get my network to start to learn something with the following changes:

Xavier initialization of weights
Zero initialization of bias
No normalization of images to [0,1]
Subtracting the mean pixel value (calculated over the whole training set) from the images
No ReLU in the final layer that calculates y_conv

After 3000 iterations of training with a batch-size of 50 images (approximately 10 epochs):

Accuracy and Loss on Training Set

On the testing set it does not perform so well, because my training set is very small and my network was over-fitting; this was expected so I am not surprised there. At least now I know that I have to focus on getting a larger training set, add more regularization or simplify my network.

High bias convolutional neural network not improving with more layers/filters

Tags:

machine-learning

neural-network

tensorflow

deep-learning

conv-neural-network

jlhw

1 Answers

jlhw

Recent Activity

Donate For Us

High bias convolutional neural network not improving with more layers/filters

Tags:

machine-learning

neural-network

tensorflow

deep-learning

conv-neural-network

jlhw

1 Answers

jlhw

Related questions

Recent Activity

Donate For Us