I'm using a transfert-style based deep learning approach that use VGG (neural network). The latter works well with images of small size (512x512pixels), however it provides distorted results when input images are large (size > 1500px). The author of the approach suggested to divide the input large image to portions and perform style-transfert to portion1 and then to portion2 and finally concatenate the two portions to have a final large result image, because VGG was made for small images... The problem with this approach is that the resulting image will have some inconsistent regions at the level of areas where the portions were "glued". How can I correct these areas ? Is the an alternative approach to this dividing method ?
The default input size for this model is 224x224.
The concept of the VGG19 model (also VGGNet-19) is the same as the VGG16 except that it supports 19 layers. The “16” and “19” stand for the number of weight layers in the model (convolutional layers). This means that VGG19 has three more convolutional layers than VGG16.
Even though ResNet is much deeper than VGG16 and VGG19, the model size is actually substantially smaller due to the usage of global average pooling rather than fully-connected layers — this reduces the model size down to 102MB for ResNet50.
This is Expert Verified Answer. Answer: Autoecncoders work best for image data.
Welcome to SO, jeanluc. Great first question.
When you say VGG, I expect you're referring to VGG-16. This architecture uses fully connected layers in the end which means you can only use it with images of a certain size. I believe the ImageNet default is 224x224 pixels.
If you want to use VGG-16 without modifications, you MUST use images of this size. However, many people remove the fully connected layers in the end (especially in the context of style transfer) in order to feed in any size they want.
Any size? Well, you probably want to make sure that the images are multiples of 32 because VGG-16 comes with 5 MaxPooling operations that half the dimensions every time.
But just because the network can now digest images of any size doesn't mean the predictions will be meaningful. VGG-16 learned what 1000 different objects look like on a scale of 224px. Using a 1500px of a cat might not activate the cat related neurons. Is that a problem?
It depends on your use case. I wouldn't trust VGG-16 to classify these high resolution images in the context of ImageNet but that is not what you're after. You want to use a pretrained VGG-16 because it should have learned some abilities that may come in handy in the context of style transfer. And this is usually true no matter the size of your input. It's almost always preferred to start out with a pretrained model in comparison to starting from scratch. You probably want to think about finetuning this model for your task because A) style transfer is quite different from classification and B) you're using a completely different scale of images.
I've never found this recommended patch based approach to help because of precisely the same problems you're experiencing. While CNN learn to recognize local pattern in an images, they will also learn global distributions which is why this doesn't work nicely. You can always try to merge patches using interpolation techniques but personally I wouldn't waste time on that.
Instead just feed in the full image like you mentioned which should work after you removed the fully connected layers. The scale will be off but there's little you can do if you really want high resolution inputs. Finetune VGG-16 so it can learn to adapt to your use case at hand.
In case you don't want to finetune, I don't think there's anything else you can do. Use the transformation/scale the network was trained on or accept less than optimal performance when you change the resolution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With