Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow CNN training images are all different sizes

I have created a Deep Convolution Neural Network to classify individual pixels in an image. My training data will always be the same size (32x32x7), but my testing data can be any size.

Github Repository

Currently, my model will only work on images that are the same size. I have used the tensorflow mnist tutorial extensively to help me construct my model. In this tutorial, we only use 28x28 images. How would the following mnist model be changed to accept images of any size?

 x = tf.placeholder(tf.float32, shape=[None, 784])
 y_ = tf.placeholder(tf.float32, shape=[None, 10])
 W = tf.Variable(tf.zeros([784,10]))
 b = tf.Variable(tf.zeros([10]))
 x_image = tf.reshape(x, [-1, 28, 28, 1])

To make things a little bit more complicated, my model has transpose convolutions where the output shape needs to be specified. How would I adjust the following line of code so that the transpose convolution will output a shape that is the same size of the input.

  DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')     
like image 627
Devin Haslam Avatar asked Dec 21 '17 17:12

Devin Haslam


People also ask

How does CNN handle photos of different sizes?

Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size.

How many images per class are sufficient for training a CNN?

Usually around 100 images are sufficient to train a class. If the images in a class are very similar, fewer images might be sufficient. the training images are representative of the variation typically found within the class.

How many images are required to train CNN?

100 number of images is quite low for a CNN algorithm. Appropriate number of samples depends on the specific problem, and it should be tested for each case individually. But a rough rule of thumb is to train a CNN algorithm with a data set larger than 5,000 samples for effective generalization of the problem.

Why do we resize images in deep learning?

Resizing images is a critical pre-processing step in computer vision. Principally, deep learning models train faster on small images. A larger input image requires the neural network to learn from four times as many pixels, and this increase the training time for the architecture.


2 Answers

Unfortunately there's no way to build dynamic graphs in Tensorflow (You could try with fold but that's outside the scope of the question). This leaves you with two options:

  1. Bucketing: You create multiple input tensors in a few hand picked sizes and then in runtime you choose the right bucket (see example). Either way you'll probably need the second option. Seq2seq with bucketing

  2. Resize the input and output images. Assuming the images all maintain the same aspect ration you can try resizing the image before inference. Not sure why you care about the output since MNIST is a classification task.

Either way you can use the same approach:

from PIL import Image

basewidth = 28 # MNIST image width
img = Image.open('your_input_img.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)

# Save image or feed directly to tensorflow 
img.save('feed_to_tf.jpg') 
like image 137
gidim Avatar answered Sep 30 '22 18:09

gidim


The mnist model code which you mentioned is an example using FC networks and not for convolution networks. The input shape of [None,784] is given for mnist size (28 x 28). The example is a FC network which has fixed input size.

What you are asking for is not possible in FC networks because the number of weights and biases are dependent on the input shape. This is possible if you are using a Fully convolution architecture. So my suggestion is to use a fully convolution architecture so that the weights and biases are not dependent on the input shape

like image 27
Abhijit Balaji Avatar answered Sep 30 '22 18:09

Abhijit Balaji