How can I train a network in TensorFlow using minibatches of data? In the Deep-MNIST tutorial, they use:
for i in range(1000):
batch = mnist.train.next_batch(50)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
My question is - are x
and y_
variables with dimensions suitable to a single example, and batch[0]
,batch[1]
are lists of such inputs and outputs? in this case, will TensorFlow automatically add the gradients for each training example in these lists? or should I create my model so that x
and y_
get an entire minibatch?
My problem is that when I try to feed it a list for each placeholder, it tries to input the entire list for the placeholder, and I therefore get a size mismatch: Cannot feed value of shape (n, m) for Tensor u'ts:0', which has shape '(m,)'
, where n
is the minibatch size and m
is the individual input size.
Thanks.
The results confirm that using small batch sizes achieves the best generalization performance, for a given computation cost. In all cases, the best results have been obtained with batch sizes of 32 or smaller. Often mini-batch sizes as small as 2 or 4 deliver optimal results.
Batch means that you use all your data to compute the gradient during one iteration. Mini-batch means you only take a subset of all your data during one iteration.
On the opposite, big batch size can really speed up your training, and even have better generalization performances. A good way to know which batch size would be good, is by using the Simple Noise Scale metric introduced in “ An Empirical Model of Large-Batch Training”.
In the MNIST tutorial x
and y_
are placeholders with a defined shape:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
The shape=[None, 784]
means that this placeholder have 2 dimension.
So, to answer your first question:
are x and y_ variables with dimensions suitable to a single example
The first dimension can contain an undefined number of elements (so, 1, 2, ... 50 ...) and the second dimension can contain exaclly 784 = 28*28 elements (that are the features of a single MNIST image).
If you feed the graph with a python list with shape [1, 784] or [50, 784] is totally the same for tensorflow, it can handle it without any problem.
batch[0],batch[1] are lists of such inputs and outputs? in the tutorial they define batch calling
batch = datasets.train.next_batch(50)
. Thus:
will TensorFlow automatically add the gradients for each training example in these lists? or should I create my model so that x and y_ get an entire minibatch?
Tensorflow handles this for you.
The error you're reporting Cannot feed value of shape (n, m) for Tensor u'ts:0', which has shape '(m,)'
is a shape mismatch error.
You're not reshaping the inputs to have the same shape of the placeholder.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With