I am trying to train my model which classifies images. The problem I have is, they have different sizes. how should i format my images/or model architecture ?

You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with. If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course. If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible. If that is your problem, here's some things you can do: <ul> <li>Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?</li> <li>Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into <code>N</code> different images of correct size.</li> <li>Pad the images with a solid color to a squared size, then resize.</li> <li>Do a combination of that.</li> </ul> The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border. If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like <code>resize_image_with_crop_or_pad</code> that take away the bigger work. As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network: <pre class="prettyprint lang-py prettyprint-override"><code># This resizing operation may distort the images because the aspect # ratio is not respected. We select a resize method in a round robin # fashion based on the thread number. # Note that ResizeMethod contains 4 enumerated resizing methods. # We select only 1 case for fast_mode bilinear. num_resize_cases = 1 if fast_mode else 4 distorted_image = apply_with_random_selector( distorted_image, lambda x, method: tf.image.resize_images(x, [height, width], method=method), num_cases=num_resize_cases) </code></pre> They're totally aware of it and do it anyway. Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.

how to format the image data for training/prediction when images are different in size?

2 Answers

You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.

If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.

If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.

If that is your problem, here's some things you can do:

Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.

The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border. If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.

As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:

# This resizing operation may distort the images because the aspect # ratio is not respected. We select a resize method in a round robin # fashion based on the thread number. # Note that ResizeMethod contains 4 enumerated resizing methods.  # We select only 1 case for fast_mode bilinear. num_resize_cases = 1 if fast_mode else 4 distorted_image = apply_with_random_selector(     distorted_image,     lambda x, method: tf.image.resize_images(x, [height, width], method=method),     num_cases=num_resize_cases)

They're totally aware of it and do it anyway.

Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.

129

answered Oct 02 '22 20:10

sunside

Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

answered Oct 02 '22 21:10

Pranay Mukherjee

Related questions
                            
                                OpenCL / AMD: Deep Learning [closed]
                            
                                What is the difference between loss function and metric in Keras? [duplicate]
                            
                                How does keras handle multiple losses?
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                Dimension of shape in conv1D
                            
                                Neural network always predicts the same class
                            
                                Keras model.summary() object to string
                            
                                TensorFlow - regularization with L2 loss, how to apply to all weights, not just last one?
                            
                                Does Any one got "AttributeError: 'str' object has no attribute 'decode' " , while Loading a Keras Saved Model
                            
                                Error when checking model input: expected convolution2d_input_1 to have 4 dimensions, but got array with shape (32, 32, 3)
                            
                                How to calculate the number of parameters for convolutional neural network?
                            
                                Gradient Descent vs Adagrad vs Momentum in TensorFlow
                            
                                How do I split a custom dataset into training and test datasets?
                            
                                Estimating the number of neurons and number of layers of an artificial neural network [closed]
                            
                                Batch Normalization in Convolutional Neural Network
                            
                                What's the difference between torch.stack() and torch.cat() functions?
                            
                                What's the difference between "hidden" and "output" in PyTorch LSTM?
                            
                                How to stack multiple lstm in keras?
                            
                                How to assign a value to a TensorFlow variable?
                            
                                What is the intuition of using tanh in LSTM? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to format the image data for training/prediction when images are different in size?

Tags:

deep-learning

Asif Mohammed

People also ask

2 Answers

sunside

Pranay Mukherjee

Recent Activity

Donate For Us