<img src="https://i.stack.imgur.com/yBrXW.png" alt="enter image description here"> Such as this, I want to using some auxiliary loss to promoting my model performance. Which type code can implement it in pytorch? <pre class="prettyprint"><code>#one loss1.backward() loss2.backward() loss3.backward() optimizer.step() #two loss1.backward() optimizer.step() loss2.backward() optimizer.step() loss3.backward() optimizer.step() #three loss = loss1+loss2+loss3 loss.backward() optimizer.step() </code></pre> Thanks for your answer!

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong. Reason is, in Pytorch, low layer gradients are Not "overwritten" by subsequent <code>backward()</code> calls, rather they are accumulated, or summed. This makes first and 3rd approach identical, though 1st approach might be preferable if you have low-memory GPU/RAM, since a batch size of 1024 with immediate <code>backward() + step()</code> call is same as having 8 batches of size 128 and 8 <code>backward()</code> calls, with one <code>step()</code> call in the end. To illustrate the idea, here is a simple example. We want to get our tensor <code>x</code> closest to <code>[40,50,60]</code> simultaneously: <pre class="prettyprint"><code>x = torch.tensor([1.0],requires_grad=True) loss1 = criterion(40,x) loss2 = criterion(50,x) loss3 = criterion(60,x) </code></pre> Now the first approach: (we use <code>tensor.grad</code> to get current gradient for our tensor <code>x</code>) <pre class="prettyprint"><code>loss1.backward() loss2.backward() loss3.backward() print(x.grad) </code></pre> This outputs : <code>tensor([-294.])</code> (EDIT: put <code>retain_graph=True</code> in first two <code>backward</code> calls for more complicated computational graphs) The third approach: <pre class="prettyprint"><code>loss = loss1+loss2+loss3 loss.backward() print(x.grad) </code></pre> Again the output is : <code>tensor([-294.])</code> 2nd approach is different because we don't call <code>opt.zero_grad</code> after calling <code>step()</code> method. This means in all 3 <code>step</code> calls gradients of first <code>backward</code> call is used. For example, if 3 losses provide gradients <code>5,1,4</code> for same weight, instead of having 10 (=5+1+4), now your weight will have <code>5*3+1*2+4*1=21</code> as gradient. For further reading : Link 1,Link 2

How can i process multi loss in pytorch?

Tags:

python

pytorch

enter image description here

Such as this, I want to using some auxiliary loss to promoting my model performance.
Which type code can implement it in pytorch?

#one loss1.backward() loss2.backward() loss3.backward() optimizer.step() #two loss1.backward() optimizer.step()  loss2.backward() optimizer.step()  loss3.backward() optimizer.step()    #three loss = loss1+loss2+loss3 loss.backward() optimizer.step()

Thanks for your answer!

304

asked Jan 01 '19 10:01

heiheihei

2 Answers

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

Reason is, in Pytorch, low layer gradients are Not "overwritten" by subsequent backward() calls, rather they are accumulated, or summed. This makes first and 3rd approach identical, though 1st approach might be preferable if you have low-memory GPU/RAM, since a batch size of 1024 with immediate backward() + step() call is same as having 8 batches of size 128 and 8 backward() calls, with one step() call in the end.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True) loss1 = criterion(40,x) loss2 = criterion(50,x) loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward() loss2.backward() loss3.backward()  print(x.grad)

This outputs : tensor([-294.]) (EDIT: put retain_graph=True in first two backward calls for more complicated computational graphs)

The third approach:

loss = loss1+loss2+loss3 loss.backward() print(x.grad)

Again the output is : tensor([-294.])

2nd approach is different because we don't call opt.zero_grad after calling step() method. This means in all 3 step calls gradients of first backward call is used. For example, if 3 losses provide gradients 5,1,4 for same weight, instead of having 10 (=5+1+4), now your weight will have 5*3+1*2+4*1=21 as gradient.

For further reading : Link 1,Link 2

146

answered Oct 04 '22 13:10

Shihab Shahriar Khan

-- Comment on first approach removed, see other answer --

Your second approach would require that you backpropagate with retain_graph=True, which incurs heavy computational costs. Moreover, it is wrong, since you would have updated the network weights with the first optimizer step, and then your next backward() call would compute the gradients prior to the update, which means that the second step() call would insert noise into your updates. If on the other hand you performed another forward() call to backpropagate through the updated weights, you would end up having an asynchronous optimization, since the first layers would be updated once with the first step(), and then once more for each subsequent step() call (not wrong per se, but inefficient and probably not what you wanted in the first place).

Long story short, the way to go is the last approach. Reduce each loss into a scalar, sum the losses and backpropagate the resulting loss. Side note; make sure your reduction scheme makes sense (e.g. if you are using reduction='sum' and the losses correspond to a multi-label classification, remember that the number of classes per objective is different, so the relative weight contributed by each loss would also be different)

answered Oct 04 '22 13:10

KonstantinosKokos

Related questions
                            
                                Python creating a shared variable between threads
                            
                                Is "from matplotlib import pyplot as plt" == "import matplotlib.pyplot as plt"?
                            
                                Relational/Logic Programming in Python?
                            
                                Ctrl-C crashes Python after importing scipy.stats
                            
                                Changing iteration variable inside for loop in Python [duplicate]
                            
                                python pass different **kwargs to multiple functions
                            
                                Tensorflow: How to replace a node in a calculation graph?
                            
                                Pandas groupby with categories with redundant nan
                            
                                Shading an area between two points in a matplotlib plot
                            
                                login() in Django testing framework
                            
                                Why does Python have a format function as well as a format method
                            
                                Memory usage keep growing with Python's multiprocessing.pool
                            
                                Is Python variable assignment atomic?
                            
                                How to get (sub)class name from a static method in Python?
                            
                                Python list multiplication: [[...]]*3 makes 3 lists which mirror each other when modified [duplicate]
                            
                                Django unique=True not working
                            
                                Is it safe to just implement __lt__ for a class that will be sorted?
                            
                                How to share secondary y-axis between subplots in matplotlib
                            
                                Difference between various numpy random functions
                            
                                Why Python's list does not have shift/unshift methods?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With