Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to apply imgaug augmentation to tf.dataDataset in Tensorflow 2.0

I have an application with an input pipeline that uses a tf.data.Dataset of images and labels. Now I would like to use augmentations, and I'm trying to use the imgaug library for that purpose. However, I do not know how to do that. All the examples I have found use Keras ImageDataGenerator or Sequence.

In code, given a sequential augmenter like this:

  self.augmenter = iaa.Sequential([
        iaa.Fliplr(config.sometimes),
        iaa.Crop(percent=config.crop_percent),
        ...
        ], random_order=config.random_order)

I am trying to apply that augmenter to batches of images in my dataset, without success. It seems that I cannot eval tensors since I'm running my augmentations inside a map function.

def augment_dataset(self, dataset):
    dataset = dataset.map(self.augment_fn())
    return dataset

def augment_fn(self):
    def augment(images, labels):
        img_array = tf.make_ndarray(images)
        images = self.augmenter.augment_images(img_array) 
        return images, labels
    return augment

For example, If I try to use make_ndarray I get an AttributeError: 'Tensor' object has no attribute 'tensor_shape'

Is this due to Dataset.map not using eager mode?. Any ideas on how to approach this?

Update #1

I tried the suggested tf.numpy_function, as follows:

def augment_fn(self):
    def augment(images, labels):
        images = tf.numpy_function(self.augmenter.augment_images,
                                   [images],
                                   images.dtype)
        return images, labels
    return augment

However, the resulting images have an unknown shape, which results in other errors later on. How can I keep the original shape of images? Before applying the augmentation function my batch of images have shape (batch_size, None, None, 1), but afterwards shape is <unknown>.

Update #2

I solved the issue with the unknown shape by first finding the dynamic (true) shape of the images and then reshaping the result of applying the augmentation.

def augment_fn(self):
    def augment(images, labels):
        img_dtype = images.dtype
        img_shape = tf.shape(images)
        images = tf.numpy_function(self.augmenter.augment_images,
                                   [images],
                                   img_dtype)
        images = tf.reshape(images, shape = img_shape)
        return images, labels
    return augment
like image 359
magomar Avatar asked Aug 06 '19 11:08

magomar


People also ask

Is data augmentation a preprocessing?

Image augmentation manipulations are forms of image preprocessing, but there is a critical difference: while image preprocessing steps are applied to training and test sets, image augmentation is only applied to the training data.

Does data augmentation increase your sample size?

Nope, the model see's the different version of augmentation in each epoch for instance 10 version of augmentation of single img in one epoch. It doesnt affect in the size of the samples at any way. It just the model creates a augmented versions of the image and makes the model learns to increase its generalization.

How do I use data augmentation in TensorFlow?

TensorFlow provides us with two methods we can use to apply data augmentation to our tf.data pipelines: Use the Sequential class and the preprocessing module to build a series of data augmentation operations, similar to Keras’ ImageDataGenerator class Apply tf.image functions to manually create the data augmentation routine

Can TensorFlow’s “TF” image processing be used for data augmentation?

Figure 7: Applying data augmentation using TensorFlow’s “tf.image” processing operations. Our output is very similar to Figure 5, thus demonstrating that we’ve been able to successfully incorporate data augmentation into our tf.data pipeline.

What is TensorFlow-datasets and how to use it?

The first additional library, tensorflow-datasets, is directly used to download an image dataset of flowers. This dataset is licensed under a permissive Creative Commons 2.0 license, making it an ideal candidate for various tasks. As the name hints, the dataset is a collection of flower images.

What is data augmentation and how does it work?

This tutorial demonstrates data augmentation: a technique to increase the diversity of your training set by applying random (but realistic) transformations, such as image rotation. You will learn how to apply data augmentation in two ways:


2 Answers

Please go to the TF Dataset documentation to see why you need to return shapes of your images when you are using tf.py_function .

def tf_random_rotate_image(image, label):
    im_shape = image.shape
    [image,] = tf.py_function(random_rotate_image, [image], [tf.float32])
    image.set_shape(im_shape)
    return image, label
like image 128
Oscar Fabio Tokunaga Herrera Avatar answered Nov 02 '22 01:11

Oscar Fabio Tokunaga Herrera


Is this due to non using eager mode? I thought Eager mode was default in TF2.0. Any ideas on how to approach this?

Yes, Dataset pre-processing is not executed in eager mode. This is, I assume, deliberate and certainly makes sense if you consider that Datasets can represent arbitrarily large (even infinite) streams of data.

Assuming that it is not possible/practical for you to translate the augmentation you are doing to tensorflow operations (which would be the first choice!) then you can use tf.numpy_function to execute arbitrary python code (this is a replacement for the now deprecated tf.py_func)

like image 27
Stewart_R Avatar answered Nov 02 '22 01:11

Stewart_R