Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The right way to define a function in theano?

Background:

Usually I will define a theano function with input like 'x = fmatrix()', however, during modifying keras (a deep learning library based on theano) to make it work with CTC cost, I noticed a very weird problem: if one input of the cost function is declared as

x = tensor.zeros(shape=[M,N], dtype='float32')

instead of

x = fmatrix()

the training process will converge much faster.

A simplified problem:

The whole codes above are quite big. So I try to simplify the problem like the following: say a function for computing Levenshtein edit distance as

import theano
from theano import tensor
from theano.ifelse import ifelse
def editdist(s, t):
    def update(x, previous_row, target):
        current_row = previous_row + 1
        current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], tensor.add(previous_row[:-1], tensor.neq(target,x))))
        current_row = tensor.set_subtensor(current_row[1:], tensor.minimum(current_row[1:], current_row[0:-1] + 1))
        return current_row
    source, target = ifelse(tensor.lt(s.shape[0], t.shape[0]), (t, s), (s, t))
    previous_row = tensor.arange(target.size + 1, dtype=theano.config.floatX)
    result, updates = theano.scan(fn = update, sequences=source, outputs_info=previous_row, non_sequences=target, name='editdist')
    return result[-1,-1]

then I define two functions f1 and f2 like:

x1 = tensor.fvector()
x2 = tensor.fvector()
r1 = editdist(x1,x2)
f1 = theano.function([x1,x2], r1)
x3 = tensor.zeros(3, dtype='float32')
x4 = tensor.zeros(3, dtype='float32')
r2 = editdist(x3,x4)
f2 = theano.function([x3,x4], r2)

When computing with f1 and f2, the results are different:

>>f1([1,2,3],[1,3,3])
   array(1.0)

>>f2([1,2,3],[1,3,3])
   array(3.0)

f1 gives the right result, but f2 doen't.

So my problem is: what is the right way to define a theano function? And, what actually went wrong about f2?

Update:

I'm using theano of version 0.8.0.dev0. I just tried theano 0.7.0, both f1 and f2 give correct result. Maybe this is a bug of theano?

Update_1st 1-27-2016:

According to the explanation of @lamblin on this issue (https://github.com/Theano/Theano/issues/3925#issuecomment-175088918), this was actually a bug of theano, and has been fixed in the latest (1-26-2016) version. For convenience, lamblin's explanation is quoted here:

The first way is the most natural one, but in theory both should be equivalent. x3 and x4 are created as the output of an "alloc" operation, the input of which would be the constant 3, rather than free inputs like x1 and x2, but that should not matter since you pass [x3, x4] as inputs to theano.function, which should cut the computation graph right there.

My guess is that scan is optimizing prematurely, believing that x3 or x4 is guaranteed to always be the constant 0, and does some simplifications that proved incorrect when values are provided for them. That would be an actual bug in scan."

Update_2nd 1-27-2016:

Unfortunately the bug is not totally fixed yet. In the background section I mentioned if one input of the cost function is declared as tensor.zeros() the convergence process will be much faster, I've found the reason: when input declared as tensor.zeros(), the cost function gave incorrect result, though mysteriously this helped the convergence. I managed a simplified problem reproduction demo here (https://github.com/daweileng/TheanoDebug), run the ctc_bench.py and you can see the results.

like image 747
Jedi Avatar asked Jan 25 '16 01:01

Jedi


1 Answers

theano.tensor.zeros(...) can't take any other value than 0.

Unless you add nodes to the graph of course and modify parts of the zeros tensor using theano.tensor.set_subtensor.

The input tensor theano.tensor.fmatrix can take any value you input.

like image 165
eickenberg Avatar answered Sep 20 '22 13:09

eickenberg