What happens if tf.stop_gradient is not set?

Tags:

I am reading the faster-rcnn code of tensorflow models. I am confused with the use of tf.stop_gradient.

Consider the following code snippet:

if self._is_training:
    proposal_boxes = tf.stop_gradient(proposal_boxes)
    if not self._hard_example_miner:
    (groundtruth_boxlists, groundtruth_classes_with_background_list, _,
     groundtruth_weights_list
    ) = self._format_groundtruth_data(true_image_shapes)
    (proposal_boxes, proposal_scores,
     num_proposals) = self._sample_box_classifier_batch(
         proposal_boxes, proposal_scores, num_proposals,
         groundtruth_boxlists, groundtruth_classes_with_background_list,
         groundtruth_weights_list)

More code is here. My question is: what happens if tf.stop_gradient is not set for proposal_boxes?

450

asked May 09 '19 12:05

tidy

1 Answers

This is really a good question, because this simple line tf.stop_gradient is very crucial in training faster_rcnn models. Here is why it is needed during training.

Faster_rcnn models are two-staged detectors and the loss function has to fulfill the goal of both stages. In faster_rcnn, the rpn loss as well as fast_rcnn loss both need to be minimized.

Here is what the paper says in section 3.2

Both RPN and Fast R-CNN, trained independently will modify their convlolutional layers in different ways. We therefore need to develop a technique that allows for sharing convolutional layers between the two networks, rather than learning two separate networks.

The paper then describes three training schemes and in the original paper they adopted the first solution -- Alternating training, that is train RPN first and then train Fast-RCNN.

The second scheme is Approximate joint training, it is easy to implement and this scheme is adopted by the API. The Fast R-CNN accepts the input coordinates from the predicted bounding boxes (by rpn), so the Fast R-CNN loss will have gradients w.r.t the bounding boxes coordinates. But in this training scheme those gradients are ignored, which is exactly why tf.stop_gradient is used. The paper reports that this training scheme will reduce the training time by 25-50%.

The third scheme is Non-approximate joint training, so no tf.stop_gradient is needed. The paper reports that having an RoI pooling layer that is differentiable w.r.t the box coordinates is a nontrivial problem.

But why are those gradients ignored?

It turns out the RoI pooling layer is fully differentiable but the main reason to favor scheme two is scheme three will cause it to be unstable early during training.

One of the authors of the API had a really good answer here

Some further reading regarding approximate joint training.

answered Sep 27 '22 18:09

danyfang

Related questions
                            
                                Python "See help(type(self)) for accurate signature."
                            
                                Datetime strptime issue with a timezone offset with colons [duplicate]
                            
                                How to draw fibonacci sequence using turtle module
                            
                                Sphinx autodoc fails to import module
                            
                                Apache Tika exclude some html tags
                            
                                Inference time using of Tensorflow Object Detection
                            
                                Clustering overlapping ellipses
                            
                                AttributeError: module 'cv2' has no attribute 'createStereoBM'
                            
                                Keras - LeakyReLU has no attribute name error when saving model
                            
                                How to save downloaded file when running spider on Scrapinghub?
                            
                                Sympy - dot product and norm of symbolic vector
                            
                                Can pytest hooks use fixtures?
                            
                                Failed to create a directory: logs/fit
                            
                                How do I write a Django query with a subquery as part of the WHERE clause?
                            
                                Change in max length of interned strings in CPython
                            
                                pymatch giving error when fitting: Unable to coerce to Series, length must be 1: given xxx
                            
                                ModuleNotFoundError for import within the same package while using pytest
                            
                                Faster double iteration over a single array in Python
                            
                                What are the keycodes `getwch` returns?
                            
                                Troublesome filter behavior when implementing the "Sieve of Eratosthenes" in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens if tf.stop_gradient is not set?

Tags:

python

tensorflow

object-detection

tensorflow-model-analysis

tidy

People also ask

1 Answers

danyfang

Recent Activity

Donate For Us