I need to execute a theano function a number of times via scan in order to sum-up a cost function and use it in a gradient computation. I'm familiar with the deep-learning tutorials that do this but my data slicing and some other complications means I need to do this a little different. Below is a much simplified version of what I'm trying to do..
tn = testnet()
cost = tn.single_cost( )
x = theano.shared(numpy.asarray([7.1,2.2,3.4], dtype='float32'))
index = T.lscalar('index')
test_fn = theano.function(inputs=[index], outputs=cost,
givens={tn.x:x[index:index+1]} )
def step(curr):
return T.constant( test_fn( curr ) )
outs,_ = theano.scan(step, T.arange(2))
out_fn = theano.function(inputs=[], outputs=outs)
print out_fn()
In the scan function, the call to test_fn(curr) is giving the error... Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
Even if I pass in an array of values instead of putting the T.arrange(2) in place, I still get the same error. Is there a reason you can't call a function from scan?
In general I'm wondering if there is a way to call a function like this with a series of indexes so that the output can feed into a T.grad() computation (not shown).
Don't make two different theano.functions
.
A theano.function
takes a symbolic relationship, optimizes it, and compiles it. What you are doing here is asking theano.scan
(and thus out_fn
) to consider a compiled function as a symbolic relationship. Whether you could technically get that to work I'm not sure, but it goes against the idea of Theano.
Since I don't know what your cost function does here I can't give an exact example, but here's a quick example which does work and should be similar enough to what I think you're trying to do.
x = theano.shared(np.asarray([7.1,2.2,3.4], dtype = np.float32))
v = T.vector("v")
def fv(v):
res,_ = theano.scan(lambda x: x ** 2, v)
return T.sum(res)
def f(i):
return fv(x[i:i+2])
outs,_ = theano.scan(
f,
T.arange(2)
)
fn = theano.function(
[],
outs,
)
fn()
After some investigation I agree that calling a function from a function is not correct. The challenge with the code is that following the basic design of the deep-learning tutorials, the first layer of the net has a symbolic variable defined as it's input and the output is propagated up to higher layers until a final cost is computed from the top layer. The tutorials uses code something like...
class layer1(object):
def __init__(self):
self.x = T.matrix()
self.output = activation(T.dot(self.x,self.W) + self.b)
For me the tensor variable (layer1.self.x) needs to change every time scan takes a step to have a new slice of data. The "givens" statement in a function does that, but since calling a compiled theano function from inside a "scan" doesn't work there are two other solutions I was able to find...
1 - Rework the network so that its cost function is based on a series of function calls instead of a propagated variable. This is technically simple but requires a bit of re-coding to get things organized properly in a multi-layer network.
2 - Use theano.clone inside of scan. That code looks something like...
def step(curr):
y_in = y[curr]
replaces = {tn.layer1.x : x[curr:curr+1]}
fn = theano.clone(tn.cost(y_in), replace=replaces)
return fn
outs,_ = theano.scan(step, sequences=[T.arange(batch_start,batch_end)])
Both methods return the same results and appear execute at the same speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With