Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data augmentation in the object detection API: random_image_scale

I am trying to use the data augmentation features of the object detection API, specifically random_image_scale.

Digging a bit I found the function implementing it (pasted below). I am missing something or the ground truth of the boxes is not treated here? I have looked around and did not find anything. If the ground truth is not modified accordingly to the scaling done to the image, it will mess up with the model being trained, won't it?

Please let me know if I am missing something or I should avoid this feature to train my network.

The file is /object_detection/core/preprocessor.py

def random_image_scale(image,
                       masks=None,
                       min_scale_ratio=0.5,
                       max_scale_ratio=2.0,
                       seed=None):
  """Scales the image size.

  Args:
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels].
    masks: (optional) rank 3 float32 tensor containing masks with
      size [height, width, num_masks]. The value is set to None if there are no
      masks.
    min_scale_ratio: minimum scaling ratio.
    max_scale_ratio: maximum scaling ratio.
    seed: random seed.

  Returns:
    image: image which is the same rank as input image.
    masks: If masks is not none, resized masks which are the same rank as input
      masks will be returned.
  """
  with tf.name_scope('RandomImageScale', values=[image]):
    result = []
    image_shape = tf.shape(image)
    image_height = image_shape[0]
    image_width = image_shape[1]
    size_coef = tf.random_uniform([],
                                  minval=min_scale_ratio,
                                  maxval=max_scale_ratio,
                                  dtype=tf.float32, seed=seed)
    image_newysize = tf.to_int32(
        tf.multiply(tf.to_float(image_height), size_coef))
    image_newxsize = tf.to_int32(
        tf.multiply(tf.to_float(image_width), size_coef))
    image = tf.image.resize_images(
        image, [image_newysize, image_newxsize], align_corners=True)
    result.append(image)
    if masks:
      masks = tf.image.resize_nearest_neighbor(
          masks, [image_newysize, image_newxsize], align_corners=True)
      result.append(masks)
    return tuple(result)
like image 636
rpicatoste Avatar asked Oct 28 '22 23:10

rpicatoste


1 Answers

If you are using a tfrecord file, the box boundaries are not absolute pixels, but relative percentages. so if you scale the image, the boxes stay the same.

So using that should be fine.

like image 98
Falco Avatar answered Nov 16 '22 10:11

Falco