I am currently trying to train Faster RCNN Inception V2 model (pre-trained with COCO) with GTSDB dataset. I have the FullIJCNN dataset and I divided the dataset into three part as training, validation and test. Lastly I've created 3 different csv files respectively and then created TFRecord files for train and validation. On the other hand, I have a code block that reads ground truth box coordinates with respect to each image and draws boxes around traffic signs on the image. Also it writes class labels correctly. Here is a few examples. Again, these boxes are not predicted by a network. They drawn manually by a function.
Drawn Boxes 1
Drawn Boxes 2
Then I have created a label file using the README file included in the dataset folder and added 0 background line to first line of the labels.txt to make it work with my code (I think this was some stupid thing to do) because it was throwing out of index error. However there is no key for "background" in my .pbtxt file to make it start from 1. Lastly I've configured the faster_rcnn_inception_v2_coco.config file, changed num_classes: 90 to num_classes: 43 since dataset has 43 classes, num_examples: 5000 to num_examples: 186 since I've divided the dataset to have 186 test examples. Used num_steps: 200000 as it is. Lastly I've started the training job by running
python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=50000 \
    --num_eval_steps=2000 \
    --alsologtostderr
command and this is the traceback (sorry for code block, I don't know how to add logs specifically):
import matplotlib; matplotlib.use('Agg')  # pylint: disable=multiple-statements
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7fc4cd6a4938>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/models/research/object_detection/core/box_predictor.py:407: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py:2037: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/models/research/object_detection/core/losses.py:317: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See @{tf.nn.softmax_cross_entropy_with_logits_v2}.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-07-26 09:48:21.785041: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-26 09:48:21.923329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 9b2f:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-26 09:48:21.923382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:48:22.153991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:48:22.154053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-07-26 09:48:22.154075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-07-26 09:48:22.154333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
2018-07-26 09:58:31.794649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:58:31.794723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:58:31.794747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-07-26 09:58:31.794765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-07-26 09:58:31.794884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
WARNING:tensorflow:Ignoring ground truth with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 1416415107 since it was previously added
It created lots of warnings like this:
WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
The reason of these messages is num_examples has been set to 2000 despite my original config file has the line num_examples: 186. I don't understand why it is creating a new config file with different parameter. However after the whole log full of those messages, it gives a report but I can't be sure what this is exactly trying to tell me. Here is the report:
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.07s).
Accumulating evaluation results...
DONE (t=0.02s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Lastly I've checked Tensorboard to make sure it is training correctly but what I see is frustrating. Here is a screenshot of Tensorboard graphs of my model (loss):
Loss
General Loss
I feel like I am doing something wrong. I don't know if this is a specific question or not but I tried to give detail as possible as much.
My questions are: What changes should I make in these steps? Why my function draws true boxes but my model can't figure out what's going on? Thanks in advance!
The reason you are receiving the warnings is because items from your dataset are being evaluated multiple times. The values you specify for num_train_steps and num_eval_steps should correlate to your train_config batch_size and the size of your dataset. For example if your batch size is 24 and you have 24000 training records the num_train_steps should be set to 1000 and likewise the same calculation method for num_eval_steps but with the number of evaluation records. The model_main.py script does not seem to be leveraging the values you specify in your pipeline.config file if you execute the script with those values specified.
I came across the same problem and after a while, I thought out this solution which worked for me but must not be the global solution; if you are using a dataset spread across multiple folders and you are using your own made tf_record converter it might be a problem in a collision of each frame naming across the whole dataset.
Since I used full path as filename (avoided collision) I haven't seen the WARNING anymore. I hope it will help somebody.
tf_example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(im_height),
    'image/width': dataset_util.int64_feature(im_width),
    'image/filename': dataset_util.bytes_feature(filename),
    'image/source_id': dataset_util.bytes_feature(filename),
    'image/encoded': dataset_util.bytes_feature(encoded_image_data),
    'image/format': dataset_util.bytes_feature(image_format),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With