Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate gradient for only part of a shared variable array

I want to do the following:

import theano, numpy, theano.tensor as T

a = T.fvector('a')

w = theano.shared(numpy.array([1, 2, 3, 4], dtype=theano.config.floatX))
w_sub = w[1]

b = T.sum(a * w)

grad = T.grad(b, w_sub)

Here, w_sub is for example w[1] but I do not want to explicitly write out b in function of w_sub. Despite going through this and other related issues I can't solve it.

This is just to show you my problem. Actually, what I really want to do is a sparse convolution with Lasagne. The zero entries in the weight matrix do not need to be updated and therefore there is no need to calculate the gradient for these entries of w.

This is now the complete error message:

Traceback (most recent call last):
  File "D:/Jeroen/Project_Lasagne_General/test_script.py", line 9, in <module>
    grad = T.grad(b, w_sub)
  File "C:\Anaconda2\lib\site-packages\theano\gradient.py", line 545, in grad
    handle_disconnected(elem)
  File "C:\Anaconda2\lib\site-packages\theano\gradient.py", line 532, in handle_disconnected
    raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0
Backtrace when the node is created:
  File "D:/Jeroen/Project_Lasagne_General/test_script.py", line 6, in <module>
    w_sub = w[1]
like image 990
Jeroen Bertels Avatar asked Mar 27 '26 12:03

Jeroen Bertels


1 Answers

When theano compiles the graph, it only sees the variables as explicitly defined in the graph. In your example, w_sub is not explicitly used in the computation of b and therefore is not part of the computation graph.

Using theano printing library with the following code, you can see on this graph vizualization that indeed w_sub is not part of the graph of b.

import theano
import theano.tensor as T
import numpy
import theano.d3viz as d3v

a = T.fvector('a')
w = theano.shared(numpy.array([1, 2, 3, 4], dtype=theano.config.floatX))
w_sub = w[1]
b = T.sum(a * w)

o = b, w_sub

d3v.d3viz(o, 'b.html')

To fix the problem, you need to explicitly use w_sub in the computation of b.

Then you will be able to compute the gradients of b wrt w_sub and update the values of the shared variable as in the following example :

import theano
import theano.tensor as T
import numpy


a = T.fvector('a')
w = theano.shared(numpy.array([1, 2, 3, 4], dtype=theano.config.floatX))
w_sub = w[1]
b = T.sum(a * w_sub)
grad = T.grad(b, w_sub)
updates = [(w, T.inc_subtensor(w_sub, -0.1*grad))]

f = theano.function([a], b, updates=updates, allow_input_downcast=True)

f(numpy.arange(10))
like image 161
LeCodeDuGui Avatar answered Apr 01 '26 06:04

LeCodeDuGui



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!