Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call a function from scan in Theano

Tags:

theano

I need to execute a theano function a number of times via scan in order to sum-up a cost function and use it in a gradient computation. I'm familiar with the deep-learning tutorials that do this but my data slicing and some other complications means I need to do this a little different. Below is a much simplified version of what I'm trying to do..

tn = testnet()
cost = tn.single_cost( )
x = theano.shared(numpy.asarray([7.1,2.2,3.4], dtype='float32'))
index = T.lscalar('index')
test_fn = theano.function(inputs=[index], outputs=cost, 
    givens={tn.x:x[index:index+1]} )

def step(curr):
    return T.constant( test_fn( curr ) )
outs,_ = theano.scan(step, T.arange(2))

out_fn = theano.function(inputs=[], outputs=outs)
print out_fn()

In the scan function, the call to test_fn(curr) is giving the error... Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

Even if I pass in an array of values instead of putting the T.arrange(2) in place, I still get the same error. Is there a reason you can't call a function from scan?

In general I'm wondering if there is a way to call a function like this with a series of indexes so that the output can feed into a T.grad() computation (not shown).

like image 821
bivouac0 Avatar asked Sep 28 '22 09:09

bivouac0


2 Answers

Don't make two different theano.functions.

A theano.function takes a symbolic relationship, optimizes it, and compiles it. What you are doing here is asking theano.scan (and thus out_fn) to consider a compiled function as a symbolic relationship. Whether you could technically get that to work I'm not sure, but it goes against the idea of Theano.

Since I don't know what your cost function does here I can't give an exact example, but here's a quick example which does work and should be similar enough to what I think you're trying to do.

x = theano.shared(np.asarray([7.1,2.2,3.4], dtype = np.float32))

v = T.vector("v")
def fv(v):
    res,_ = theano.scan(lambda x: x ** 2, v)
    return T.sum(res)

def f(i):
    return fv(x[i:i+2])

outs,_ = theano.scan(
    f, 
    T.arange(2)
    )

fn = theano.function(
    [],
    outs,
    )

fn()
like image 116
Steve Hastings Avatar answered Oct 07 '22 21:10

Steve Hastings


After some investigation I agree that calling a function from a function is not correct. The challenge with the code is that following the basic design of the deep-learning tutorials, the first layer of the net has a symbolic variable defined as it's input and the output is propagated up to higher layers until a final cost is computed from the top layer. The tutorials uses code something like...

class layer1(object):
   def __init__(self):
      self.x = T.matrix()
      self.output = activation(T.dot(self.x,self.W) + self.b)

For me the tensor variable (layer1.self.x) needs to change every time scan takes a step to have a new slice of data. The "givens" statement in a function does that, but since calling a compiled theano function from inside a "scan" doesn't work there are two other solutions I was able to find...

1 - Rework the network so that its cost function is based on a series of function calls instead of a propagated variable. This is technically simple but requires a bit of re-coding to get things organized properly in a multi-layer network.

2 - Use theano.clone inside of scan. That code looks something like...

def step(curr):
    y_in = y[curr]
    replaces = {tn.layer1.x : x[curr:curr+1]}
    fn = theano.clone(tn.cost(y_in), replace=replaces)
    return fn
outs,_ = theano.scan(step, sequences=[T.arange(batch_start,batch_end)])

Both methods return the same results and appear execute at the same speed.

like image 27
bivouac0 Avatar answered Oct 07 '22 21:10

bivouac0