Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create 2-layers neural network using TensorFlow and python on MNIST data

I'm a newbie in machine learning and I am following tensorflow's tutorial to create some simple Neural Networks which learn the MNIST data.

I have built a single layer network (following the tutotial), accuracy was about 0.92 which is ok for me. But then I added one more layer, the accuracy reduced to 0.113, which is very bad.

Below is the relation between 2 layers:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.nn.softmax(tf.matmul(x, W1) + b1)

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])

Is my structure fine? What is the reason that makes it perform so bad? How should I modify my network?

like image 564
Tai Christian Avatar asked Jul 01 '16 04:07

Tai Christian


People also ask

What is two layer neural network?

There are two layers in our neural network (note that the counting index starts with the first hidden layer up to the output layer). Moreover, the topology between each layer is fully-connected. For the hidden layer, we have ReLU nonlinearity, whereas for the output layer, we have a Softmax loss function.


1 Answers

The input of the 2nd layer is the softmax of the output of the first layer. You don't want to do that.

You're forcing the sum of these values to be 1. If some value of tf.matmul(x, W1) + b1 is about 0 (and some certainly are) the softmax operation is lowering this value to be 0. Result: you're killing the gradient and nothing can flow trough these neurons.

If you remove the softmax between the layers (but leve it the softmax on the output layer if you want to consider the values as probability) your network will work fine.

Tl;dr:

import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])

#layer 1
W1 = tf.Variable(tf.zeros([784, 100]))
b1 = tf.Variable(tf.zeros([100]))
y1 = tf.matmul(x, W1) + b1 #remove softmax

#layer 2
W2 = tf.Variable(tf.zeros([100, 10]))
b2 = tf.Variable(tf.zeros([10]))
y2 = tf.nn.softmax(tf.matmul(y1, W2) + b2)

#output
y = y2
y_ = tf.placeholder(tf.float32, [None, 10])
like image 191
nessuno Avatar answered Sep 26 '22 01:09

nessuno