To create a class label in CutMix
or MixUp
type augmentation, we can use beta
such as np.random.beta
or scipy.stats.beta
and do as follows for two labels:
label = label_one*beta + (1-beta)*label_two
But what if we've more than two images? In YoLo4, they've tried an interesting augmentation called Mosaic Augmentation for object detection problems. Unlike CutMix
or MixUp
, this augmentation creates augmented samples with 4 images. In object detection cases, we can compute the shift of each instance co-ords and thus possible to get the proper ground truth, here. But for only image classification cases, how can we do that?
Here is a starter.
import tensorflow as tf
import matplotlib.pyplot as plt
import random
(train_images, train_labels), (test_images, test_labels) = \
tf.keras.datasets.cifar10.load_data()
train_images = train_images[:10,:,:]
train_labels = train_labels[:10]
train_images.shape, train_labels.shape
((10, 32, 32, 3), (10, 1))
Here is a function we've written for this augmentation; ( too ugly with an `inner-outer loop! Please suggest if we can do it efficiently.)
def mosaicmix(image, label, DIM, minfrac=0.25, maxfrac=0.75):
'''image, label: batches of samples
'''
xc, yc = np.random.randint(DIM * minfrac, DIM * maxfrac, (2,))
indices = np.random.permutation(int(image.shape[0]))
mosaic_image = np.zeros((DIM, DIM, 3), dtype=np.float32)
final_imgs, final_lbs = [], []
# Iterate over the full indices
for j in range(len(indices)):
# Take 4 sample for to create a mosaic sample randomly
rand4indices = [j] + random.sample(list(indices), 3)
# Make mosaic with 4 samples
for i in range(len(rand4indices)):
if i == 0: # top left
x1a, y1a, x2a, y2a = 0, 0, xc, yc
x1b, y1b, x2b, y2b = DIM - xc, DIM - yc, DIM, DIM # from bottom right
elif i == 1: # top right
x1a, y1a, x2a, y2a = xc, 0, DIM , yc
x1b, y1b, x2b, y2b = 0, DIM - yc, DIM - xc, DIM # from bottom left
elif i == 2: # bottom left
x1a, y1a, x2a, y2a = 0, yc, xc, DIM
x1b, y1b, x2b, y2b = DIM - xc, 0, DIM, DIM-yc # from top right
elif i == 3: # bottom right
x1a, y1a, x2a, y2a = xc, yc, DIM, DIM
x1b, y1b, x2b, y2b = 0, 0, DIM-xc, DIM-yc # from top left
# Copy-Paste
mosaic_image[y1a:y2a, x1a:x2a] = image[i,][y1b:y2b, x1b:x2b]
# Append the Mosiac samples
final_imgs.append(mosaic_image)
return final_imgs, label
The augmented samples, currently with the wrong labels.
data, label = mosaicmix(train_images, train_labels, 32)
plt.imshow(data[5]/255)
However, here are some more examples to motivate you. Data is from the Cassava Leaf competition.
We already know that, in CutMix, λ
is a float number from the beta distribution Beta(α,α). We have seen, when α=1
, it performs best. Now, If we grant α==1
always, we can say that λ
is sampled from the uniform distribution..
Simply we can say λ
is just a floating-point number which value will be 0 to 1.
So, only for 2 images,
if we use λ
for the 1st image then we can calculate the remaining unknown portion simply by 1-λ
.
But for 3 images, if we use λ
for the 1st image, we cannot calculate other 2 unknowns from that single λ
. If we really want to do so, we need 2 random numbers for 3 images. In the same way, we can say that for the n
number of images, we need the n-1
number random variable. And in all cases, the summation should be 1
. (for example, λ + (1-λ) == 1
). If the sum is not 1
, the label will be wrong!
For this purpose Dirichlet distribution might be helpful because it helps to generate quantities that sum to 1. A Dirichlet-distributed random variable can be seen as a multivariate generalization of a Beta distribution.
>>> np.random.dirichlet((1, 1), 1) # for 2 images. Equivalent to λ and (1-λ)
array([[0.92870347, 0.07129653]])
>>> np.random.dirichlet((1, 1, 1), 1) # for 3 images.
array([[0.38712673, 0.46132787, 0.1515454 ]])
>>> np.random.dirichlet((1, 1, 1, 1), 1) # for 4 images.
array([[0.59482542, 0.0185333 , 0.33322484, 0.05341645]])
In CutMix, the size of the cropped part of an image has a relation with λ
which weighting the corresponding labels.
So, for multiple λ
, you also need to calculate them accordingly.
# let's say for 4 images
# I am not sure the proper way.
image_list = [4 images]
label_list = [4 label]
new_img = np.zeros((w, h))
beta_list = np.random.dirichlet((1, 1, 1, 1), 1)[0]
for idx, beta in enumerate(beta_list):
x0, y0, w, h = get_cropping_params(beta, full_img) # something like this
new_img[x0, y0, w, h] = image_list[idx][x0, y0, w, h]
label_list[idx] = label_list[idx] * beta
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With