Data augmentation in the object detection API: random_image_scale

Question

I am trying to use the data augmentation features of the object detection API, specifically random_image_scale.

Digging a bit I found the function implementing it (pasted below). I am missing something or the ground truth of the boxes is not treated here? I have looked around and did not find anything. If the ground truth is not modified accordingly to the scaling done to the image, it will mess up with the model being trained, won't it?

Please let me know if I am missing something or I should avoid this feature to train my network.

The file is /object_detection/core/preprocessor.py

def random_image_scale(image,
                       masks=None,
                       min_scale_ratio=0.5,
                       max_scale_ratio=2.0,
                       seed=None):
  """Scales the image size.

  Args:
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels].
    masks: (optional) rank 3 float32 tensor containing masks with
      size [height, width, num_masks]. The value is set to None if there are no
      masks.
    min_scale_ratio: minimum scaling ratio.
    max_scale_ratio: maximum scaling ratio.
    seed: random seed.

  Returns:
    image: image which is the same rank as input image.
    masks: If masks is not none, resized masks which are the same rank as input
      masks will be returned.
  """
  with tf.name_scope('RandomImageScale', values=[image]):
    result = []
    image_shape = tf.shape(image)
    image_height = image_shape[0]
    image_width = image_shape[1]
    size_coef = tf.random_uniform([],
                                  minval=min_scale_ratio,
                                  maxval=max_scale_ratio,
                                  dtype=tf.float32, seed=seed)
    image_newysize = tf.to_int32(
        tf.multiply(tf.to_float(image_height), size_coef))
    image_newxsize = tf.to_int32(
        tf.multiply(tf.to_float(image_width), size_coef))
    image = tf.image.resize_images(
        image, [image_newysize, image_newxsize], align_corners=True)
    result.append(image)
    if masks:
      masks = tf.image.resize_nearest_neighbor(
          masks, [image_newysize, image_newxsize], align_corners=True)
      result.append(masks)
    return tuple(result)

Falco · Accepted Answer

If you are using a tfrecord file, the box boundaries are not absolute pixels, but relative percentages. so if you scale the image, the boxes stay the same.

So using that should be fine.

Data augmentation in the object detection API: random_image_scale

Tags:

tensorflow

object-detection

object-detection-api

rpicatoste

1 Answers

Falco

Recent Activity

Donate For Us

Data augmentation in the object detection API: random_image_scale

Tags:

tensorflow

object-detection

object-detection-api

rpicatoste

1 Answers

Falco

Related questions

Recent Activity

Donate For Us