Performing data augmentation for classification task is easy as most transform do not change the ground truth label of the image.
However in the case of object localization:
I am unable to understand how such cases are handled in object localization. Most papers suggest the use of Multi-Scale training but dont address these issues.
The augmentation methods have to alter the content of the bounding box. In the case of Color augmentations, the pixel distribution would be changed and the coordinates of the bounding box would not change. But in the case of geometric augmentations such as cropping or scaling, not only the pixel distribution would be affected but also the coordinates of the bounding box. Those changes should be kept in the annotation files so the algorithm can read it.
Custom scripts are common to solve this problem. However, In my repository I have a library that would help you. Here is the link https://github.com/lozuwa/impy . With this library you can perform the operations I described previously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With