I have some data represented by input_x
. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n
. input_x
undergoes tf.nn.embedding_lookup
, so that embed
now has dimensions [?, n, m]
where m
is the embedding size and ?
refers to the unknown batch size.
This is described here:
input_x = tf.placeholder(tf.int32, [None, n], name="input_x") embed = tf.nn.embedding_lookup(W, input_x)
I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U
, and I can't seem to get how to do that.
I first tried using tf.matmul
but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U
and applying batch_matmul
(I also tried the function from tf.nn.math_ops.
, the result was the same):
U = tf.Variable( ... ) U1 = tf.expand_dims(U,0) h=tf.batch_matmul(embed, U1)
This passes the initial compilation, but then when actual data is applied, I get the following error:
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
I also know why this is happening - I replicated the dimension of U
and it is now 1
, but the minibatch size, 64
, doesn't fit.
How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?
Previous answers are obsolete. Currently tf.matmul()
support tensors with rank > 2:
The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.
Also tf.batch_matmul()
was removed and tf.matmul()
is the right way to do batch multiplication. The main idea can be understood from the following code:
import tensorflow as tf batch_size, n, m, k = 10, 3, 5, 2 A = tf.Variable(tf.random_normal(shape=(batch_size, n, m))) B = tf.Variable(tf.random_normal(shape=(batch_size, m, k))) tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k)
. Here is what is going on here. Assume you have batch_size
of matrices nxm
and batch_size
of matrices mxk
. Now for each pair of them you calculate nxm X mxk
which gives you an nxk
matrix. You will have batch_size
of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m))) B = tf.Variable(tf.random_normal(shape=(a, b, m, k))) tf.matmul(A, B)
and will give you a shape (a, b, n, k)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With