Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error using data augmentation options in the Object Detection API

I am trying to use the data_augmentation_options in the .config files to train a network, specifically a ssd_mobilenet_v1, but when I activate the option random_adjust_brightness, I get the error message pasted below very quickly (I activate the option after the step 110000).

I tried reducing the default value:

optional float max_delta=1 [default=0.2];

But the result was the same.

Any idea why? The images are RGB from png files (from the Bosch Small Traffic Lights Dataset).

INFO:tensorflow:global step 110011: loss = 22.7990 (0.357 sec/step)
INFO:tensorflow:global step 110012: loss = 47.8811 (0.401 sec/step)
2017-11-16 11:02:29.114785: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114895: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114969: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115043: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115112: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
...

Edit: The workaround I have found is this. The inf or nan is in the loss, so checking the function in /object_detection/core/preprocessor.py doing the brightness randomization:

def random_adjust_brightness(image, max_delta=0.2):
  """Randomly adjusts brightness.

  Makes sure the output image is still between 0 and 1.

  Args:
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
           with pixel values varying between [0, 1].
    max_delta: how much to change the brightness. A value between [0, 1).

  Returns:
    image: image which is the same shape as input image.
    boxes: boxes which is the same shape as input boxes.
  """
  with tf.name_scope('RandomAdjustBrightness', values=[image]):
    image = tf.image.random_brightness(image, max_delta)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image

It is assuming that the image values must be between 0.0 and 1.0. Is it possible that the images are actually arriving with 0 mean and even a different range? In that case, the clipping is corrupting them and leading to the fail. Long story short: I commented out the clipping line and it is working (we will see the results).

like image 837
rpicatoste Avatar asked Nov 07 '22 14:11

rpicatoste


1 Answers

Often, getting LossTensor is inf or nan. : Tensor had NaN values is due to an error in the bounding boxes / annotations (Source: https://github.com/tensorflow/models/issues/1881).

I know that the Bosch Small Traffic Light Dataset has some annotations that extend outside of the image dimensions. For example, the height of an image in that dataset is 720 pixels, but some bounding boxes have a height coordinate greater than 720. This is common because whenever the car recording the sequence goes under a traffic light, some of the traffic light is visible, and some of it is cut off.

I know this isn't an exact answer to your question, but hopefully it provides insight on a possible reason why you are having the problem. Perhaps removing annotations that extend outside of the image dimensions will help solve the problem; however, I'm dealing with the same problem except I am not using image preprocessing. On the same dataset, I'm encountering the LossTensor is inf or nan. : Tensor had NaN values error every ~8000 steps.

like image 80
Adpon Avatar answered Nov 15 '22 08:11

Adpon