TensorFlow MLP not training XOR

Tags:

I've built an MLP with Google's TensorFlow library. The network is working but somehow it refuses to learn properly. It always converges to an output of nearly 1.0 no matter what the input actually is.

The complete code can be seen here.

Any ideas?

The input and output (batch size 4) is as follows:

input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]  # XOR input
output_data = [[0.], [1.], [1.], [0.]]  # XOR output

n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")

Hidden layer configuration:

# hidden layer's bias neuron
b_hidden = tf.Variable(0.1, name="hidden_bias")

# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights")

# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)

Output layer configuration:

W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights")  # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output))  # calc output layer's activation

My learning methods look like this:

loss = tf.reduce_mean(cross_entropy)  # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.01)  # take a gradient descent for optimizing
train = optimizer.minimize(loss)  # let the optimizer train

I tried both setups for cross entropy:

cross_entropy = -tf.reduce_sum(n_output * tf.log(output))

and

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output)

where n_output is the original output as described in output_data and output the predicted/calculated value by my network.

The training inside the for-loop (for n epochs) goes like this:

cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
                   feed_dict={n_input: input_data, n_output: output_data})

I am saving the outcome to cvalues for debug printig of loss, W_hidden, ...

No matter what I've tried, when I test my network, trying to validate the output, it always produces something like this:

(...)
step: 2000
loss: 0.0137040186673
b_hidden: 1.3272010088
W_hidden: [[ 0.23195425  0.53248233 -0.21644847 -0.54775208  0.52298909]
 [ 0.73933059  0.51440752 -0.08397482 -0.62724304 -0.53347367]]
W_output: [[ 1.65939867]
 [ 0.78912479]
 [ 1.4831928 ]
 [ 1.28612828]
 [ 1.12486529]]

(--- finished with 2000 epochs ---)

(Test input for validation:)

input: [0.0, 0.0] | output: [[ 0.99339396]]
input: [0.0, 1.0] | output: [[ 0.99289012]]
input: [1.0, 0.0] | output: [[ 0.99346077]]
input: [1.0, 1.0] | output: [[ 0.99261558]]

So it is not learning properly but always converging to nearly 1.0 no matter which input is fed.

839

asked Nov 30 '15 11:11

daniel451

1 Answers

In the meanwhile with the help of a colleague I were able to fix my solution and wanted to post it for completeness. My solution works with cross entropy and without altering the training data. Additionally it has the desired input shape of (1, 2) and ouput is scalar.

It makes use of an AdamOptimizer which decreases the error much faster than a GradientDescentOptimizer. See this post for more information (& questions^^) about the optimizer.

In fact, my network produces reasonably good results in only 400-800 learning steps.

After 2000 learning steps the output is nearly "perfect":

step: 2000
loss: 0.00103311243281

input: [0.0, 0.0] | output: [[ 0.00019799]]
input: [0.0, 1.0] | output: [[ 0.99979786]]
input: [1.0, 0.0] | output: [[ 0.99996307]]
input: [1.0, 1.0] | output: [[ 0.00033751]]

import tensorflow as tf    

#####################
# preparation stuff #
#####################

# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]  # XOR input
output_data = [[0.], [1.], [1.], [0.]]  # XOR output

# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2] and output's [1, 1]
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")

# number of neurons in the hidden layer
hidden_nodes = 5


################
# hidden layer #
################

# hidden layer's bias neuron
b_hidden = tf.Variable(tf.random_normal([hidden_nodes]), name="hidden_bias")

# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_normal([2, hidden_nodes]), name="hidden_weights")

# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)


################
# output layer #
################

W_output = tf.Variable(tf.random_normal([hidden_nodes, 1]), name="output_weights")  # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output))  # calc output layer's activation


############
# learning #
############
cross_entropy = -(n_output * tf.log(output) + (1 - n_output) * tf.log(1 - output))
# cross_entropy = tf.square(n_output - output)  # simpler, but also works

loss = tf.reduce_mean(cross_entropy)  # mean the cross_entropy
optimizer = tf.train.AdamOptimizer(0.01)  # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss)  # let the optimizer train


####################
# initialize graph #
####################
init = tf.initialize_all_variables()

sess = tf.Session()  # create the session and therefore the graph
sess.run(init)  # initialize all variables  

#####################
# train the network #
#####################
for epoch in xrange(0, 2001):
    # run the training operation
    cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
                       feed_dict={n_input: input_data, n_output: output_data})

    # print some debug stuff
    if epoch % 200 == 0:
        print("")
        print("step: {:>3}".format(epoch))
        print("loss: {}".format(cvalues[1]))
        # print("b_hidden: {}".format(cvalues[3]))
        # print("W_hidden: {}".format(cvalues[2]))
        # print("W_output: {}".format(cvalues[4]))


print("")
print("input: {} | output: {}".format(input_data[0], sess.run(output, feed_dict={n_input: [input_data[0]]})))
print("input: {} | output: {}".format(input_data[1], sess.run(output, feed_dict={n_input: [input_data[1]]})))
print("input: {} | output: {}".format(input_data[2], sess.run(output, feed_dict={n_input: [input_data[2]]})))
print("input: {} | output: {}".format(input_data[3], sess.run(output, feed_dict={n_input: [input_data[3]]})))

153

answered Oct 25 '22 06:10

daniel451

Related questions
                            
                                Conditional Substitution of values in pandas dataframe columns
                            
                                string.format() with {} inside string as string [duplicate]
                            
                                Using odo to migrate data to SQL
                            
                                Django: skip system check when running custom command
                            
                                Flask, not all arguments converted during string formatting
                            
                                Pythonic way to generate string rotations
                            
                                Python intersection of 2 lists of dictionaries
                            
                                How to get the text from a checkbutton in python ? (Tkinter)
                            
                                Reading PNG with PIL in Python
                            
                                Seaborn ticklabels are being truncated
                            
                                Create heatmap in python matplotlib with x and y labels from dict with {tuple:float} format
                            
                                Trouble deleting certain nested JSON objects in python
                            
                                Libxml2 installation onto Mac
                            
                                How to scrape dynamic webpages by Python
                            
                                How can I select n items and skip m from ndarray in python?
                            
                                Can I create list from regular expressions?
                            
                                youtube-dl: setting metadata attributes and embedding thumbnail from python?
                            
                                Merging dictionary keys if values the same
                            
                                Plot a pandas dataframe grouped by column
                            
                                AWS DynamoDB - Load data with Boto3 using JSON file as input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

TensorFlow MLP not training XOR

Tags:

python

machine-learning

neural-network

tensorflow

supervised-learning

daniel451

People also ask

1 Answers

daniel451

Recent Activity

Donate For Us