I have created a Deep Convolution Neural Network to classify individual pixels in an image. My training data will always be the same size (32x32x7), but my testing data can be any size.
Github Repository
Currently, my model will only work on images that are the same size. I have used the tensorflow mnist tutorial extensively to help me construct my model. In this tutorial, we only use 28x28 images. How would the following mnist model be changed to accept images of any size?
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
x_image = tf.reshape(x, [-1, 28, 28, 1])
To make things a little bit more complicated, my model has transpose convolutions where the output shape needs to be specified. How would I adjust the following line of code so that the transpose convolution will output a shape that is the same size of the input.
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size.
Usually around 100 images are sufficient to train a class. If the images in a class are very similar, fewer images might be sufficient. the training images are representative of the variation typically found within the class.
100 number of images is quite low for a CNN algorithm. Appropriate number of samples depends on the specific problem, and it should be tested for each case individually. But a rough rule of thumb is to train a CNN algorithm with a data set larger than 5,000 samples for effective generalization of the problem.
Resizing images is a critical pre-processing step in computer vision. Principally, deep learning models train faster on small images. A larger input image requires the neural network to learn from four times as many pixels, and this increase the training time for the architecture.
Unfortunately there's no way to build dynamic graphs in Tensorflow (You could try with fold but that's outside the scope of the question). This leaves you with two options:
Bucketing: You create multiple input tensors in a few hand picked sizes and then in runtime you choose the right bucket (see example). Either way you'll probably need the second option. Seq2seq with bucketing
Resize the input and output images. Assuming the images all maintain the same aspect ration you can try resizing the image before inference. Not sure why you care about the output since MNIST is a classification task.
Either way you can use the same approach:
from PIL import Image
basewidth = 28 # MNIST image width
img = Image.open('your_input_img.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)
# Save image or feed directly to tensorflow
img.save('feed_to_tf.jpg')
The mnist model code which you mentioned is an example using FC networks and not for convolution networks. The input shape of [None,784] is given for mnist size (28 x 28). The example is a FC network which has fixed input size.
What you are asking for is not possible in FC networks because the number of weights and biases are dependent on the input shape. This is possible if you are using a Fully convolution architecture. So my suggestion is to use a fully convolution architecture so that the weights and biases are not dependent on the input shape
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With