I have the following code
output = T.switch(cond, a, b)
where cond is a (N,1) bool Tensor, while a and b are (N, M) numeric Tensors with M being quite large. The condition operates on a row-wise manner.
I can easily make the switch work by running T.repeat() on cond, but this is quite slow. Is there a way I can efficiently make the bools in cond decide whether a or b should be returned?
Is there a way I can efficiently make the bools in cond decide whether a or b should be returned?
Yes, you could do
cond * a + (1-cond) * b
cond will be broadcast to (N, M) shape.
This should be close to the theoretical limit, which is the memory bandwidth: this operation needs to read about N*M elements and write N*M.
Instead, we read 2*N*M, but remove the conditional logic.
(I don't have Theano in front of me, so I am not sure if it's faster than T.switch, but it should be about as good as it gets. Also, I'd try casting cond to the same dtype as a and b)
If you want to update a in-place, you can do it using T.set_subtensor:
a = np.random.uniform(size=(N, M)).astype(np.float32)
b = np.random.uniform(size=(N, M)).astype(np.float32)
a = theano.shared(a)
b = theano.shared(b)
c = T.vector() # mostly 0, presumably (1-cond)
nz = T.nonzero(c)
s = T.set_subtensor(a[nz], b[nz])
fn = theano.function([c], [], updates=[(a, s)])
...
fn(1-cond)
It may or may not be faster than the first approach, depending on N, M and other factors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With