Strategy to put and get large images in VGG neural networks

Tags:

I'm using a transfert-style based deep learning approach that use VGG (neural network). The latter works well with images of small size (512x512pixels), however it provides distorted results when input images are large (size > 1500px). The author of the approach suggested to divide the input large image to portions and perform style-transfert to portion1 and then to portion2 and finally concatenate the two portions to have a final large result image, because VGG was made for small images... The problem with this approach is that the resulting image will have some inconsistent regions at the level of areas where the portions were "glued". How can I correct these areas ? Is the an alternative approach to this dividing method ?

314

asked Sep 10 '20 12:09

jeanluc

1 Answers

Welcome to SO, jeanluc. Great first question.

When you say VGG, I expect you're referring to VGG-16. This architecture uses fully connected layers in the end which means you can only use it with images of a certain size. I believe the ImageNet default is 224x224 pixels.

If you want to use VGG-16 without modifications, you MUST use images of this size. However, many people remove the fully connected layers in the end (especially in the context of style transfer) in order to feed in any size they want.

Any size? Well, you probably want to make sure that the images are multiples of 32 because VGG-16 comes with 5 MaxPooling operations that half the dimensions every time.

But just because the network can now digest images of any size doesn't mean the predictions will be meaningful. VGG-16 learned what 1000 different objects look like on a scale of 224px. Using a 1500px of a cat might not activate the cat related neurons. Is that a problem?

It depends on your use case. I wouldn't trust VGG-16 to classify these high resolution images in the context of ImageNet but that is not what you're after. You want to use a pretrained VGG-16 because it should have learned some abilities that may come in handy in the context of style transfer. And this is usually true no matter the size of your input. It's almost always preferred to start out with a pretrained model in comparison to starting from scratch. You probably want to think about finetuning this model for your task because A) style transfer is quite different from classification and B) you're using a completely different scale of images.

I've never found this recommended patch based approach to help because of precisely the same problems you're experiencing. While CNN learn to recognize local pattern in an images, they will also learn global distributions which is why this doesn't work nicely. You can always try to merge patches using interpolation techniques but personally I wouldn't waste time on that.

Instead just feed in the full image like you mentioned which should work after you removed the fully connected layers. The scale will be off but there's little you can do if you really want high resolution inputs. Finetune VGG-16 so it can learn to adapt to your use case at hand.

In case you don't want to finetune, I don't think there's anything else you can do. Use the transformation/scale the network was trained on or accept less than optimal performance when you change the resolution.

answered Oct 11 '22 01:10

pietz

Related questions
                            
                                python3: timedrotatingfilehandler log rotation issue with same log file with multiple scripts
                            
                                Pandas group by a specific value in any of given columns
                            
                                Preload scripts not being executed in webview iframes
                            
                                Using enums for dynamic polymorphism in Rust
                            
                                Play store 2020: Is there a review process for "promoting" an app from closed testing to production?
                            
                                Provider vs ValueNotifier Flutter
                            
                                How do I keep elements aligned with "justify-content: space-between" if some of them have "display:none"?
                            
                                super got strikethrough in React
                            
                                Android studio - Change string in textView
                            
                                How to Convert From HEIC to JPG in Python on WIndows
                            
                                Three.js - Scaling a plane to full screen
                            
                                Deflation Method for Multiple Root Finding

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With