I am training a convolutional neural network using TensorFlow to classify images of buildings into 5 classes.
Training dataset:
Class 1 - 3000 images
Class 2 - 3000 images
Class 3 - 3000 images
Class 4 - 3000 images
Class 5 - 3000 images
I started out with a very simple architecture:
Input image - 256 x 256 x 3
Convolutional layer 1 - 128 x 128 x 16 (3x3 filters, 16 filters, stride=2)
Convolutional layer 2 - 64 x 64 x 32 (3x3 filters, 32 filters, stride=2)
Convolutional layer 3 - 32 x 32 x 64 (3x3 filters, 64 filters, stride=2)
Max-pooling layer - 16 x 16 x 64 (2x2 pooling)
Fully-connected layer 1 - 1 x 1024
Fully-connected layer 2 - 1 x 64
Output - 1 x 5
Other details of my network:
Cost-function: tf.softmax_cross_entropy_with_logits
Optimizer: Adam optimizer (Learning rate=0.01, Epsilon=0.1)
Mini-batch size: 5
My cost-function has a high starting value of around 10^10 and then drops rapidly to a value of about 1.6 (after a few hundred iterations) and saturates at that value (no matter how long I train the network for). The cost-function value on the test set is the same. This value is equivalent to predicting approximately equal probability for each class and it makes the same predictions for all images. My predictions look something like this:
[0.191877 0.203651 0.194455 0.200043 0.203081]
A high error on both the training and test set indicate high bias i.e. underfitting. I increased the complexity of my network by adding layers and increasing the number of filters and my latest network is this (the number of layers and filter sizes are similar to AlexNet):
Input image - 256 x 256 x 3
Convolutional layer 1 - 64 x 64 x 64 (11x11 filters, 64 filters, stride=4)
Convolutional layer 2 - 32 x 32 x 128 (5x5 filters, 128 filters, stride=2)
Convolutional layer 3 - 16 x 16 x 256 (3x3 filters, 256 filters, stride=2)
Convolutional layer 4 - 8 x 8 x 512 (3x3 filters, 512 filters, stride=2)
Convolutional layer 5 - 8 x 8 x 256 (3x3 filters, 256 filters, stride=1)
Fully-connected layer 1 - 1 x 4096
Fully-connected layer 2 - 1 x 4096
Fully-connected layer 3 - 1 x 4096
Dropout layer (0.5 probability)
Output - 1 x 5
However, my cost-function is still saturating at approximately 1.6 and making the same predictions.
My questions are:
Code:
import tensorflow as tf
sess = tf.Session()
BATCH_SIZE = 50
MAX_CAPACITY = 300
TRAINING_STEPS = 3001
# To get the list of image filenames and labels from the text file
def read_labeled_image_list(list_filename):
f = open(list_filename,'r')
filenames = []
labels = []
for line in f:
filename, label = line[:-1].split(' ')
filenames.append(filename)
labels.append(int(label))
return filenames,labels
# To get images and labels in batches
def add_to_batch(image,label):
image_batch,label_batch = tf.train.batch([image,label],batch_size=BATCH_SIZE,num_threads=1,capacity=MAX_CAPACITY)
return image_batch, tf.reshape(label_batch,[BATCH_SIZE])
# To decode a single image and its label
def read_image_with_label(input_queue):
""" Image """
# Read
file_contents = tf.read_file(input_queue[0])
example = tf.image.decode_png(file_contents)
# Reshape
my_image = tf.cast(example,tf.float32)
my_image = tf.reshape(my_image,[256,256,3])
# Normalisation
my_image = my_image/255
my_mean = tf.reduce_mean(my_image)
# Centralisation
my_image = my_image - my_mean
""" Label """
label = input_queue[1]-1
return add_to_batch(my_image,label)
# Network
def inference(x):
""" Layer 1: Convolutional """
# Initialise variables
W_conv1 = tf.Variable(tf.truncated_normal([11,11,3,64],stddev=0.0001),name='W_conv1')
b_conv1 = tf.Variable(tf.constant(0.1,shape=[64]),name='b_conv1')
# Convolutional layer
h_conv1 = tf.nn.relu(tf.nn.conv2d(x,W_conv1,strides=[1,4,4,1],padding='SAME') + b_conv1)
""" Layer 2: Convolutional """
# Initialise variables
W_conv2 = tf.Variable(tf.truncated_normal([5,5,64,128],stddev=0.0001),name='W_conv2')
b_conv2 = tf.Variable(tf.constant(0.1,shape=[128]),name='b_conv2')
# Convolutional layer
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_conv1,W_conv2,strides=[1,2,2,1],padding='SAME') + b_conv2)
""" Layer 3: Convolutional """
# Initialise variables
W_conv3 = tf.Variable(tf.truncated_normal([3,3,128,256],stddev=0.0001),name='W_conv3')
b_conv3 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv3')
# Convolutional layer
h_conv3 = tf.nn.relu(tf.nn.conv2d(h_conv2,W_conv3,strides=[1,2,2,1],padding='SAME') + b_conv3)
""" Layer 4: Convolutional """
# Initialise variables
W_conv4 = tf.Variable(tf.truncated_normal([3,3,256,512],stddev=0.0001),name='W_conv4')
b_conv4 = tf.Variable(tf.constant(0.1,shape=[512]),name='b_conv4')
# Convolutional layer
h_conv4 = tf.nn.relu(tf.nn.conv2d(h_conv3,W_conv4,strides=[1,2,2,1],padding='SAME') + b_conv4)
""" Layer 5: Convolutional """
# Initialise variables
W_conv5 = tf.Variable(tf.truncated_normal([3,3,512,256],stddev=0.0001),name='W_conv5')
b_conv5 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv5')
# Convolutional layer
h_conv5 = tf.nn.relu(tf.nn.conv2d(h_conv4,W_conv5,strides=[1,1,1,1],padding='SAME') + b_conv5)
""" Layer X: Pooling
# Pooling layer
h_pool1 = tf.nn.max_pool(h_conv3,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')"""
""" Layer 6: Fully-connected """
# Initialise variables
W_fc1 = tf.Variable(tf.truncated_normal([8*8*256,4096],stddev=0.0001),name='W_fc1')
b_fc1 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc1')
# Multiplication layer
h_conv5_reshaped = tf.reshape(h_conv5,[-1,8*8*256])
h_fc1 = tf.nn.relu(tf.matmul(h_conv5_reshaped, W_fc1) + b_fc1)
""" Layer 7: Fully-connected """
# Initialise variables
W_fc2 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc2')
b_fc2 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc2')
# Multiplication layer
h_fc2 = tf.nn.relu(tf.matmul(h_fc1, W_fc2) + b_fc2)
""" Layer 8: Fully-connected """
# Initialise variables
W_fc3 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc3')
b_fc3 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc3')
# Multiplication layer
h_fc3 = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3)
""" Layer 9: Dropout layer """
# Keep/drop nodes with 50% chance
h_dropout = tf.nn.dropout(h_fc3,0.5)
""" Readout layer: Softmax """
# Initialise variables
W_softmax = tf.Variable(tf.truncated_normal([4096,5],stddev=0.0001),name='W_softmax')
b_softmax = tf.Variable(tf.constant(0.1,shape=[5]),name='b_softmax')
# Multiplication layer
y_conv = tf.nn.relu(tf.matmul(h_dropout,W_softmax) + b_softmax)
""" Summaries """
tf.histogram_summary('W_conv1',W_conv1)
tf.histogram_summary('W_conv2',W_conv2)
tf.histogram_summary('W_conv3',W_conv3)
tf.histogram_summary('W_conv4',W_conv4)
tf.histogram_summary('W_conv5',W_conv5)
tf.histogram_summary('W_fc1',W_fc1)
tf.histogram_summary('W_fc2',W_fc2)
tf.histogram_summary('W_fc3',W_fc3)
tf.histogram_summary('W_softmax',W_softmax)
tf.histogram_summary('b_conv1',b_conv1)
tf.histogram_summary('b_conv2',b_conv2)
tf.histogram_summary('b_conv3',b_conv3)
tf.histogram_summary('b_conv4',b_conv4)
tf.histogram_summary('b_conv5',b_conv5)
tf.histogram_summary('b_fc1',b_fc1)
tf.histogram_summary('b_fc2',b_fc2)
tf.histogram_summary('b_fc3',b_fc3)
tf.histogram_summary('b_softmax',b_softmax)
return y_conv
# Training
def cost_function(y_label,y_conv):
# Reshape y_label to one-hot vectors
sparse_labels = tf.reshape(y_label,[BATCH_SIZE,1])
indices = tf.reshape(tf.range(BATCH_SIZE),[BATCH_SIZE,1])
concated = tf.concat(1,[indices,sparse_labels])
dense_labels = tf.sparse_to_dense(concated,[BATCH_SIZE,5],1.0,0.0)
# Cross-entropy
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv,dense_labels))
# Accuracy
y_prob = tf.nn.softmax(y_conv)
correct_prediction = tf.equal(tf.argmax(dense_labels,1), tf.argmax(y_prob,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Add to summary
tf.scalar_summary('loss',cost)
tf.scalar_summary('accuracy',accuracy)
return cost, accuracy
def main ():
# To get list of filenames and labels
filename = '/labels/filenames_with_labels_server.txt'
image_list, label_list = read_labeled_image_list(filename)
images = tf.convert_to_tensor(image_list, dtype=tf.string)
labels = tf.convert_to_tensor(label_list,dtype=tf.int32)
# To create the queue
input_queue = tf.train.slice_input_producer([images,labels],shuffle=True,capacity=MAX_CAPACITY)
# To train network
image,label = read_image_with_label(input_queue)
y_conv = inference(image)
loss,acc = cost_function(label,y_conv)
train_step = tf.train.AdamOptimizer(learning_rate=0.001,epsilon=0.1).minimize(loss)
# To write and merge summaries
writer = tf.train.SummaryWriter('/SummaryLogs/log', sess.graph)
merged = tf.merge_all_summaries()
# To save variables
saver = tf.train.Saver()
""" Run session """
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
print('Running...')
for step in range(1,TRAINING_STEPS):
loss_val,acc_val,_,summary_str = sess.run([loss,acc,train_step,merged])
writer.add_summary(summary_str,step)
print "Step %d, Loss %g, Accuracy %g"%(step,loss_val,acc_val)
if(step == 1):
save_path = saver.save(sess,'/SavedVariables/model',global_step=step)
print "Initial model saved: %s"%save_path
save_path = saver.save(sess,'/SavedVariables/model-final')
print "Final model saved: %s"%save_path
""" Close session """
print('Finished')
sess.close()
if __name__ == '__main__':
main()
EDIT:
After making some changes, I managed to get the network to overfit to a small training set of 50 images.
Changes:
Encouraged by this, I proceeded to train my network on the whole training set, only to encounter the SAME issue again. These are the outputs:
Step 1, Loss 1.37815, Accuracy 0.4
y_conv (before softmax):
[[ 0.30913264 0. 1.20176554 0. 0. ]
[ 0. 0. 1.23200822 0. 0. ]
[ 0. 0. 0. 0. 0. ]
[ 0. 0. 1.65852785 0.01910716 0. ]
[ 0. 0. 0.94612855 0. 0.10457891]]
y_prob (after softmax):
[[ 0.1771856 0.130069 0.43260741 0.130069 0.130069 ]
[ 0.13462381 0.13462381 0.46150482 0.13462381 0.13462381]
[ 0.2 0.2 0.2 0.2 0.2 ]
[ 0.1078648 0.1078648 0.56646001 0.1099456 0.1078648 ]
[ 0.14956713 0.14956713 0.38524282 0.14956713 0.16605586]]
Very quickly it becomes:
Step 39, Loss 1.60944, Accuracy 0.2
y_conv (before softmax):
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
y_prob (after softmax):
[[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]]
Clearly a y_conv
of all zeros is not a good sign. Looking at the histograms, the weight variables do not change after initialization; only the bias variables change.
This is not so much a "complete" answer but rather a "things you can try if you are facing a similar problem" answer.
I managed to get my network to start to learn something with the following changes:
y_conv
After 3000 iterations of training with a batch-size of 50 images (approximately 10 epochs):
On the testing set it does not perform so well, because my training set is very small and my network was over-fitting; this was expected so I am not surprised there. At least now I know that I have to focus on getting a larger training set, add more regularization or simplify my network.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With