I tried to build a simple MLP with an input layer (2 neurons), a hidden layer (5 neurons) and an output layer (1 neuron). I planned to train and feed it with [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] for getting the desired output of [0., 1., 1., 0.] (elementwise).
Unfortunately my code refuses to run. I keep getting dimensionality errors no matter what I'm trying. Quite frustrating :/ I think I'm missing something but I can not figure out what is wrong.
For better readability I also uploaded the code to a pastebin: code
Any ideas?
import tensorflow as tf
#####################
# preparation stuff #
#####################
# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [0., 1., 1., 0.] # XOR output
# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2]
n_input = tf.placeholder(tf.float32, shape=[None, 2])
# number of neurons in the hidden layer
hidden_nodes = 5
################
# hidden layer #
################
b_hidden = tf.Variable(0.1) # hidden layer's bias neuron
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0)) # hidden layer's weight matrix
# initialized with a uniform distribution
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden) # calc hidden layer's activation
################
# output layer #
################
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0)) # output layer's weight matrix
output = tf.sigmoid(tf.matmul(W_output, hidden)) # calc output layer's activation
############
# learning #
############
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(output, n_input) # calc cross entropy between current
# output and desired output
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.1) # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss) # let the optimizer train
####################
# initialize graph #
####################
init = tf.initialize_all_variables()
sess = tf.Session() # create the session and therefore the graph
sess.run(init) # initialize all variables
# train the network
for epoch in xrange(0, 201):
sess.run(train) # run the training operation
if epoch % 20 == 0:
print("step: {:>3} | W: {} | b: {}".format(epoch, sess.run(W_hidden), sess.run(b_hidden)))
EDIT: I am still getting errors :/
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
outputs line 27 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible. Altering the line to:
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
seems to be working, but then the error appears in:
output = tf.sigmoid(tf.matmul(hidden, W_output))
telling me: line 34 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible
Turning the statement to:
output = tf.sigmoid(tf.matmul(W_output, hidden))
also throws an exception: line 34 (...) ValueError: Dimensions Dimension(1) and Dimension(5) are not compatible.
EDIT2: I do not really understand this. Shouldn't hidden be W_hidden x n_input.T, since in dimensions this would be (5, 2) x (2, 1)? If I transpose n_input hidden is still working (I even don't get the point why it is working without a transpose at all). However, output keeps throwing errors but this operation in dimensions should be (1, 5) x (5, 1)?!
(0) It's helpful to include the error output - it's also a useful thing to look at, because it does identify exactly where you were having shape problems.
(1) The shape errors arose because you have the arguments to matmul backwards in both of your matmuls, and have the tf.Variable backwards. The general rule is that the weights for layer that has input_size, output_size should be [input_size, output_size], and the matmul should be tf.matmul(input_to_layer, weights_for_layer) (and then add the biases, which are of shape [output_size]).
So with your code,
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0))
should be:
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0))
and
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
should be tf.matmul(n_input, W_hidden); and
output = tf.sigmoid(tf.matmul(W_output, hidden))
should be tf.matmul(hidden, W_output)
(2) Once you've fixed those bugs, your run needs to be fed a feed_dict:
sess.run(train)
should be:
sess.run(train, feed_dict={n_input: input_data})
At least, I presume that this is what you're trying to achieve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With