I've built an MLP with Google's TensorFlow library. The network is working but somehow it refuses to learn properly. It always converges to an output of nearly 1.0 no matter what the input actually is.
The complete code can be seen here.
Any ideas?
The input and output (batch size 4) is as follows:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
Hidden layer configuration:
# hidden layer's bias neuron
b_hidden = tf.Variable(0.1, name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
Output layer configuration:
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
My learning methods look like this:
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.01) # take a gradient descent for optimizing
train = optimizer.minimize(loss) # let the optimizer train
I tried both setups for cross entropy:
cross_entropy = -tf.reduce_sum(n_output * tf.log(output))
and
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output)
where n_output
is the original output as described in output_data
and output
the predicted/calculated value by my network.
The training inside the for-loop (for n epochs) goes like this:
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
I am saving the outcome to cvalues for debug printig of loss
, W_hidden
, ...
No matter what I've tried, when I test my network, trying to validate the output, it always produces something like this:
(...)
step: 2000
loss: 0.0137040186673
b_hidden: 1.3272010088
W_hidden: [[ 0.23195425 0.53248233 -0.21644847 -0.54775208 0.52298909]
[ 0.73933059 0.51440752 -0.08397482 -0.62724304 -0.53347367]]
W_output: [[ 1.65939867]
[ 0.78912479]
[ 1.4831928 ]
[ 1.28612828]
[ 1.12486529]]
(--- finished with 2000 epochs ---)
(Test input for validation:)
input: [0.0, 0.0] | output: [[ 0.99339396]]
input: [0.0, 1.0] | output: [[ 0.99289012]]
input: [1.0, 0.0] | output: [[ 0.99346077]]
input: [1.0, 1.0] | output: [[ 0.99261558]]
So it is not learning properly but always converging to nearly 1.0 no matter which input is fed.
A perceptron can only converge on linearly separable data. Therefore, it isn't capable of imitating the XOR function.
MLP solves the XOR problem efficiently by visualizing the data points in multi-dimensions and thus constructing an n-variable equation to fit in the output values.
The XOR problem with neural networks can be solved by using Multi-Layer Perceptrons or a neural network architecture with an input layer, hidden layer, and output layer. So during the forward propagation through the neural networks, the weights get updated to the corresponding layers and the XOR logic gets executed.
It was first pointed out by Papert and Minsky that a single neuron cannot learn the XOR function since a single hyperplane (line in this case) cannot separate the output classes for this function definition.
In the meanwhile with the help of a colleague I were able to fix my solution and wanted to post it for completeness. My solution works with cross entropy and without altering the training data. Additionally it has the desired input shape of (1, 2) and ouput is scalar.
It makes use of an AdamOptimizer
which decreases the error much faster than a GradientDescentOptimizer
. See this post for more information (& questions^^) about the optimizer.
In fact, my network produces reasonably good results in only 400-800 learning steps.
After 2000 learning steps the output is nearly "perfect":
step: 2000
loss: 0.00103311243281
input: [0.0, 0.0] | output: [[ 0.00019799]]
input: [0.0, 1.0] | output: [[ 0.99979786]]
input: [1.0, 0.0] | output: [[ 0.99996307]]
input: [1.0, 1.0] | output: [[ 0.00033751]]
import tensorflow as tf
#####################
# preparation stuff #
#####################
# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2] and output's [1, 1]
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
# number of neurons in the hidden layer
hidden_nodes = 5
################
# hidden layer #
################
# hidden layer's bias neuron
b_hidden = tf.Variable(tf.random_normal([hidden_nodes]), name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_normal([2, hidden_nodes]), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
################
# output layer #
################
W_output = tf.Variable(tf.random_normal([hidden_nodes, 1]), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
############
# learning #
############
cross_entropy = -(n_output * tf.log(output) + (1 - n_output) * tf.log(1 - output))
# cross_entropy = tf.square(n_output - output) # simpler, but also works
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.AdamOptimizer(0.01) # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss) # let the optimizer train
####################
# initialize graph #
####################
init = tf.initialize_all_variables()
sess = tf.Session() # create the session and therefore the graph
sess.run(init) # initialize all variables
#####################
# train the network #
#####################
for epoch in xrange(0, 2001):
# run the training operation
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
# print some debug stuff
if epoch % 200 == 0:
print("")
print("step: {:>3}".format(epoch))
print("loss: {}".format(cvalues[1]))
# print("b_hidden: {}".format(cvalues[3]))
# print("W_hidden: {}".format(cvalues[2]))
# print("W_output: {}".format(cvalues[4]))
print("")
print("input: {} | output: {}".format(input_data[0], sess.run(output, feed_dict={n_input: [input_data[0]]})))
print("input: {} | output: {}".format(input_data[1], sess.run(output, feed_dict={n_input: [input_data[1]]})))
print("input: {} | output: {}".format(input_data[2], sess.run(output, feed_dict={n_input: [input_data[2]]})))
print("input: {} | output: {}".format(input_data[3], sess.run(output, feed_dict={n_input: [input_data[3]]})))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With