Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this neural network learn nothing?

I am learning TensorFlow and was implementing a simple neural network as explained in MNIST for Beginners in TensorFlow docs. Here is the link. The accuracy was about 80-90 %, as expected.

Then following the same article was MNIST for Experts using ConvNet. Instead of implementing that I decided to improve the beginner part. I know about Neural Nets and how they learn and the fact that deep networks can perform better than shallow networks. I modified the original program in MNIST for Beginner to implement a Neural network with 2 hidden layers each of 16 neurons.

It looks something like this :

Image of Network

the neural network i build

Code for it

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')

# We are going to make 2 hidden layer neurons with 16 neurons each

# All the weights in network
W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.zeros([784, 16]))
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.zeros([16, 16]))
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.zeros([16, 10]))

# All the biases for the network
B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.zeros([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.zeros([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.zeros([10]))


def build_graph():
    """This functions wires up all the biases and weights of the network
    and returns the last layer connections
    :return: returns the activation in last layer of network/output layer without softmax
    """
    A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
    A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
    return tf.matmul(A2, W2) + B2


def print_accuracy(sx, sy, tf_session):
    """This function prints the accuracy of a model at the time of invocation
    :return: None
    """
    correct_prediction = tf.equal(tf.argmax(y), tf.argmax(tf.nn.softmax(build_graph())))
    correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
    accuracy = tf.reduce_mean(correct_prediction_float)

    print(accuracy.eval(feed_dict={x: sx, y: sy}, session=tf_session))


y_predicted = build_graph()

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))

model = tf.train.GradientDescentOptimizer(0.03).minimize(cross_entropy)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for _ in range(1000):
        batch_x, batch_y = mnist.train.next_batch(50)
        if _ % 100 == 0:
            print_accuracy(batch_x, batch_y, sess)
        sess.run(model, feed_dict={x: batch_x, y: batch_y})

The Output expected was supposed to be better than what could be achieved with when just a single layer (Assumed that W0 has shape of [784,10] and B0 has shape of [10])

def build_graph():
     return tf.matmul(x,W0) + B0

Instead, the output says that network was not training at all. The Accuracy was not crossing 20% in any iteration.

Output

Extracting MNIST_data/train-images-idx3-ubyte.gz

Extracting MNIST_data/train-labels-idx1-ubyte.gz

Extracting MNIST_data/t10k-images-idx3-ubyte.gz

Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

My Question

What is wrong with the above program that it does not generalize at all? How can I improve it more without using convolutional neural networks?

like image 492
coder3101 Avatar asked Dec 29 '17 19:12

coder3101


People also ask

Why neural network stops learning?

A neural network is stopped training when the error, i.e., the difference between the desired output and the expected output is below some threshold value or the number of iterations or epochs is above some threshold value.

Can neural networks learn anything?

' Having said that, yes, a neural network can 'learn' from experience. In fact, the most common application of neural networks is to 'train' a neural network to produce a specific pattern as its output when it is presented with a given pattern as its input.

Why is my machine learning model not learning?

If you have too many classes, you could try to reduce the number by grouping them first, using a clustering algorithm. There is no data leakage from the training set into the test set. The dataset does not have noisy/empty attributes, too many missing values, or too many outliers.

Why are neural networks hard to train?

Specifically, you learned: Training a neural network involves using an optimization algorithm to find a set of weights to best map inputs to outputs. The problem is hard, not least because the error surface is non-convex and contains local minima, flat spots, and is highly multidimensional.


Video Answer


1 Answers

Your main mistake is network symmetry, because you initialized all weights to zeros. As a result, the weights are never updated. Change it to small random numbers and it will start learning. It's ok to initialize biases with zeros.

Another issue is pure technical: print_accuracy function is creating new nodes in the computational graph, and since you're calling it in the loop, the graph gets bloated and eventually will use up all memory.

You might also want to play with hyper-parameters and make the network bigger to increase its capacity.

Edit: I also spotted a bug in your accuracy calculation. It should be

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))

Here's a complete code:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')

W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.truncated_normal([784, 16]) * 0.001)
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.truncated_normal([16, 16]) * 0.001)
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.truncated_normal([16, 10]) * 0.001)

B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.ones([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.ones([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.ones([10]))

A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
y_predicted = tf.matmul(A2, W2) + B2
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))
correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
accuracy = tf.reduce_mean(correct_prediction_float)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)

mnist = input_data.read_data_sets('mnist', one_hot=True)
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch_x, batch_y = mnist.train.next_batch(64)
    _, cost_val, acc_val = sess.run([optimizer, cross_entropy, accuracy], feed_dict={x: batch_x, y: batch_y})
    if i % 100 == 0:
      print('cost=%.3f accuracy=%.3f' % (cost_val, acc_val))
like image 75
Maxim Avatar answered Oct 19 '22 20:10

Maxim